UDTF Class Overview
You create your UDTF by subclassing two classes defined by the Vertica SDK: TransformFunction
and TransformFunctionFactory
.
The TransformFunctionFactory
performs two roles:
- It provides the number of parameters and their and data types accepted by the UDTF and the number of output columns and their data types UDTF's output. Vertica uses this data when you call the CREATE TRANSFORM FUNCTION SQL statement to add the function to the database catalog.
- It returns an instance of the UDTF function's
TransformFunction
subclass that Vertica can call to process data.
TransformFunction
The TransformFunction
class is where you perform the data-processing, transforming input rows into output rows. Your subclass must define the processPartition()
method. It may define methods to set up and tear down the function.
Performing the Transformation
The processPartition()
method carries out all of the processing that you want your UDTF to perform. When a user calls your function in a SQL statement, Vertica bundles together the data from the function parameters and passes it to processPartition()
.
The input and output of the processPartition()
method are supplied by objects of the BlockReader
and BlockWriter
classes. They define methods that you use to read the input data and write the output data for your UDTF.
Your processPartition()
method should follow this basic pattern:
- Extract the input parameters by calling a data-type-specific function in the
PartitionReader
object to extract each input parameter. All of these functions take a single parameter: the column number in the input row that you want to read. Your function might need to handle NULL values. - Perform the actual transformation.
- Write the output, if any. Unlike a UDSF, outputting data is optional for a UDTF. However, if it does write output, it must supply values for all of the output columns you defined in your factory. Similarly to reading input columns, the
PartitionWriter
object has functions for writing each type of data to the output row. - Advance to the next row by calling
ProcessReader.next()
. This function returns true if there is another row of input data to process and false if all the data in the partition has been read. Once the input rows are exhausted, your UDTF exits so its results are returned back to Vertica
Note: In some cases, you may want to determine the number and types of parameters using PartitionReader
's getNumCols()
and getTypeMetaData()
functions, instead of just hard-coding the data types of the columns in the input row. This is useful if you want your TransformFunction
to be able to process input tables with different schemas. You can then use different TransformFunctionFactory
classes to define multiple function signatures that call the same TransformFunction
class. See Handling Different Numbers and Types of Arguments for more information.
Setting Up and Tearing Down
The TransformFunction
class defines two additional methods that you can optionally implement to allocate and free resources: setup()
and destroy()
. You should use these methods to allocate and deallocate resources that you do not allocate through the UDx API (see Allocating Resources for UDxs for details).
TransformFunctionFactory
The TransformFunctionFactory
class tells Vertica metadata about your UDTF: its number of parameters and their data types, as well as the data type of its return value. It also instantiates a subclass of TransformFunction
.
You must implement the following methods in your TransformFunctionFactory
:
getPrototype()
returns twoColumnTypes
objects that describe the columns your UDTF takes as input and returns as output.createTransformFunction()
instantiates yourTransformFunction
subclass.getReturnType()
tells Vertica details about the output values: the width of variable-sized data types (such as VARCHAR) and the precision of data types that have settable precision (such as TIMESTAMP). You can also set the names of the output columns using this function.
Note: The getReturnType()
function is required for UDTFs. It is optional for UDxs that return single values, such as User-Defined Scalar Functions.