Java Example: Rank
The Rank
analytic function ranks rows based on how they are ordered.
Loading and Using the Example
The following example shows how to load the function into Vertica. It assumes that the AnalyticFunctions.jar
library that contains the function has been copied to the dbadmin user's home directory on the initiator node.
=> CREATE LIBRARY AnalyticFunctions AS '/home/dbadmin/AnalyticFunctions.jar'; CREATE LIBRARY => CREATE ANALYTIC FUNCTION an_rank AS LANGUAGE 'Java' NAME 'RankFactory' LIBRARY AnalyticFunctions; CREATE ANALYTIC FUNCTION
An example of running this rank function, named an_rank
, is:
=> SELECT * FROM hits; site | date | num_hits -----------------+------------+---------- www.example.com | 2012-01-02 | 97 www.vertica.com | 2012-01-01 | 343435 www.example.com | 2012-01-01 | 123 www.example.com | 2012-01-04 | 112 www.vertica.com | 2012-01-02 | 503695 www.vertica.com | 2012-01-03 | 490387 www.example.com | 2012-01-03 | 123 (7 rows) => SELECT site,date,num_hits,an_rank() OVER (PARTITION BY site ORDER BY num_hits DESC) AS an_rank FROM hits; site | date | num_hits | an_rank -----------------+------------+----------+--------- www.example.com | 2012-01-03 | 123 | 1 www.example.com | 2012-01-01 | 123 | 1 www.example.com | 2012-01-04 | 112 | 3 www.example.com | 2012-01-02 | 97 | 4 www.vertica.com | 2012-01-02 | 503695 | 1 www.vertica.com | 2012-01-03 | 490387 | 2 www.vertica.com | 2012-01-01 | 343435 | 3 (7 rows)
As with the built-in RANK analytic function, rows that have the same value for the ORDER BY column (num_hits in this example) have the same rank, but the rank continues to increase, so that the next row that has a different ORDER BY key gets a rank value based on the number of rows that preceded it.
AnalyticFunction Implementation
The following code defines an AnalyticFunction
subclass named Rank
. The code can be found in the SDK examples directory.
/** * User-defined analytic function: Rank - works mostly the same as SQL-99 rank * with the ability to define as many order by columns as desired * */ public class Rank extends AnalyticFunction { private int rank, numRowsWithSameOrder; @Override public void processPartition(ServerInterface srvInterface, AnalyticPartitionReader inputReader, AnalyticPartitionWriter outputWriter) throws UdfException, DestroyInvocation { rank = 0; numRowsWithSameOrder = 1; do{ if(!inputReader.isNewOrderByKey()){ ++numRowsWithSameOrder; }else { rank += numRowsWithSameOrder; numRowsWithSameOrder = 1; } outputWriter.setLong(0, rank); outputWriter.next(); }while(inputReader.next()); } }
In this example, the processPartition()
method does not actually read any of the data from the input row; it just advances through the rows. It just needs to count the number of rows that have been read and determine whether those rows have the same ORDER BY key as the previous row. If the current row is a new ORDER BY key, then the rank is set to the total number of rows that have been processed. If the current row has the same ORDER BY value as the previous row, then the rank remains the same.
AnalyticFunctionFactory Implementation
The following code defines the AnalyticFunctionFactory
that corresponds with the Rank
analytic function.
public class RankFactory extends AnalyticFunctionFactory { @Override public void getPrototype(ServerInterface srvInterface, ColumnTypes argTypes, ColumnTypes returnType) { returnType.addInt(); } @Override public void getReturnType(ServerInterface srvInterface, SizedColumnTypes argTypes, SizedColumnTypes returnType) throws UdfException { returnType.addInt(); } @Override public AnalyticFunction createAnalyticFunction(ServerInterface srvInterface) { return new Rank(); } }
The first method defined by the RankFactory
subclass, getPrototype()
, sets the data type of the return value. Because the Rank UDAnF does not read input, it does not define any arguments by calling methods on the ColumnTypes
object passed in the argTypes
parameter.
The next method is getReturnType()
. If your analytic function returns a data type that needs to define a width or precision, your implementation of the getReturnType
function calls a method on the SizedColumnType
parameter to tell Vertica the width or precision. Rank
returns a fixed-width data type (an INTEGER) so it does not need to set the precision or width of its output; it just calls addInt()
to report its output data type.
Finally, RankFactory
defines the createAnalyticFunction()
method that returns an instance of the AnalyticFunction
class that Vertica can call.