Java Example: Rank

The Rank analytic function ranks rows based on how they are ordered.

Loading and Using the Example

The following example shows how to load the function into Vertica. It assumes that the AnalyticFunctions.jar library that contains the function has been copied to the dbadmin user's home directory on the initiator node.

=> CREATE LIBRARY AnalyticFunctions AS '/home/dbadmin/AnalyticFunctions.jar';
CREATE LIBRARY
=> CREATE ANALYTIC FUNCTION an_rank AS LANGUAGE 'Java'
   NAME 'RankFactory' LIBRARY AnalyticFunctions;
CREATE ANALYTIC FUNCTION

An example of running this rank function, named an_rank, is:

=> SELECT * FROM hits;
      site       |    date    | num_hits 
-----------------+------------+----------
 www.example.com | 2012-01-02 |       97
 www.vertica.com | 2012-01-01 |   343435
 www.example.com | 2012-01-01 |      123
 www.example.com | 2012-01-04 |      112
 www.vertica.com | 2012-01-02 |   503695
 www.vertica.com | 2012-01-03 |   490387
 www.example.com | 2012-01-03 |      123
(7 rows)
=> SELECT site,date,num_hits,an_rank() 
   OVER (PARTITION BY site ORDER BY num_hits DESC) 
   AS an_rank FROM hits;
      site       |    date    | num_hits | an_rank 
-----------------+------------+----------+---------
 www.example.com | 2012-01-03 |      123 |       1
 www.example.com | 2012-01-01 |      123 |       1
 www.example.com | 2012-01-04 |      112 |       3
 www.example.com | 2012-01-02 |       97 |       4
 www.vertica.com | 2012-01-02 |   503695 |       1
 www.vertica.com | 2012-01-03 |   490387 |       2
 www.vertica.com | 2012-01-01 |   343435 |       3
(7 rows)

As with the built-in RANK analytic function, rows that have the same value for the ORDER BY column (num_hits in this example) have the same rank, but the rank continues to increase, so that the next row that has a different ORDER BY key gets a rank value based on the number of rows that preceded it.

AnalyticFunction Implementation

The following code defines an AnalyticFunction subclass named Rank. The code can be found in the SDK examples directory.

/**
 * User-defined analytic function: Rank - works mostly the same as SQL-99 rank
 * with the ability to define as many order by columns as desired
 *
 */
public class Rank extends AnalyticFunction {
  private int rank, numRowsWithSameOrder;

  @Override
    public void processPartition(ServerInterface srvInterface,
                                 AnalyticPartitionReader inputReader, 
                                 AnalyticPartitionWriter outputWriter)
    throws UdfException, DestroyInvocation {
    rank = 0;
    numRowsWithSameOrder = 1;
    do{
      if(!inputReader.isNewOrderByKey()){
        ++numRowsWithSameOrder;
      }else {
        rank += numRowsWithSameOrder;
        numRowsWithSameOrder = 1;
      }
      outputWriter.setLong(0, rank);
      outputWriter.next();
    }while(inputReader.next());
  }
}

In this example, the processPartition() method does not actually read any of the data from the input row; it just advances through the rows. It just needs to count the number of rows that have been read and determine whether those rows have the same ORDER BY key as the previous row. If the current row is a new ORDER BY key, then the rank is set to the total number of rows that have been processed. If the current row has the same ORDER BY value as the previous row, then the rank remains the same.

AnalyticFunctionFactory Implementation

The following code defines the AnalyticFunctionFactory that corresponds with the Rank analytic function.

public class RankFactory extends AnalyticFunctionFactory {

  @Override
    public void getPrototype(ServerInterface srvInterface, 
                             ColumnTypes argTypes,
                             ColumnTypes returnType) {
    returnType.addInt();
  }

  @Override
    public void getReturnType(ServerInterface srvInterface, 
                              SizedColumnTypes argTypes,
                              SizedColumnTypes returnType) throws UdfException {
    returnType.addInt();
  }

  @Override
    public AnalyticFunction createAnalyticFunction(ServerInterface srvInterface) {
    return new Rank();
  }
}

The first method defined by the RankFactory subclass, getPrototype(), sets the data type of the return value. Because the Rank UDAnF does not read input, it does not define any arguments by calling methods on the ColumnTypes object passed in the argTypes parameter.

The next method is getReturnType(). If your analytic function returns a data type that needs to define a width or precision, your implementation of the getReturnType function calls a method on the SizedColumnType parameter to tell Vertica the width or precision. Rank returns a fixed-width data type (an INTEGER) so it does not need to set the precision or width of its output; it just calls addInt() to report its output data type.

Finally, RankFactory defines the createAnalyticFunction() method that returns an instance of the AnalyticFunction class that Vertica can call.