UDSF Class Overview
You create your UDSF by subclassing two classes defined by the Vertica SDK: ScalarFunction
and ScalarFunctionFactory
.
ScalarFunction
The ScalarFunction
class is the heart of a UDSF. Your subclass must define the processBlock()
method to perform the scalar operation. It may define methods to set up and tear down the function.
Performing the Operation
The processBlock()
method carries out all of the processing that you want your UDSF to perform. When a user calls your function in a SQL statement, Vertica bundles together the data from the function parameters and passes it to processBlock()
.
The input and output of the processBlock()
method are supplied by objects of the BlockReader
and BlockWriter
classes. They define methods that you use to read the input data and write the output data for your UDSF.
The majority of the work in developing a UDSF is writing processBlock()
. This is where all of the processing in your function occurs. Your UDSF should follow this basic pattern:
- Read in a set of parameters from the
BlockReader
object using data-type-specific methods. - Process the data in some manner.
- Output the resulting value using one of the
BlockWriter
class's data-type-specific methods. - Advance to the next row of output and input by calling
BlockWriter.next()
andBlockReader.next()
.
This process continues until there are no more rows of data to be read (BlockReader.next()
returns false).
You must make sure that processBlock()
reads all of the rows in its input and outputs a single value for each row. Failure to do so can corrupt the data structures that Vertica reads to get the output of your UDSF. The only exception to this rule is if your processBlock()
function reports an error back to Vertica (see Handling Errors). In that case, Vertica does not attempt to read the incomplete result set generated by the UDSF.
Setting Up and Tearing Down
The ScalarFunction
class defines two additional methods that you can optionally implement to allocate and free resources: setup()
and destroy()
. You should use these methods to allocate and deallocate resources that you do not allocate through the UDx API (see Allocating Resources for UDxs for details).
Notes
- While the name you choose for your
ScalarFunction
subclass does not have to match the name of the SQL function you will later assign to it, Vertica considers making the names the same a best practice. - Do not assume that your function will be called from the same thread that instantiated it.
- The same instance of your
ScalarFunction
subclass can be called on to process multiple blocks of data. -
The rows of input sent to
processBlock()
are not guaranteed to be any particular order. - Writing too many output rows can cause Vertica to emit an out-of-bounds error.
ScalarFunctionFactory
The ScalarFunctionFactory
class tells Vertica metadata about your UDSF: its number of parameters and their data types, as well as the data type of its return value. It also instantiates a subclass of ScalarFunction
.
Methods
You must implement the following methods in your ScalarFunctionFactory
subclass:
createScalarFunction()
instantiates aScalarFunction
subclass. If writing in C++, you can call thevt_createFuncObj
macro with the name of theScalarFunction
subclass. This macro takes care of allocating and instantiating the class for you.getPrototype()
tells Vertica about the parameters and return type for your UDSF. In addition to aServerInterface
object, this method gets twoColumnTypes
objects. All you need to do in this function is to call class functions on these two objects to build the list of parameters and the single return value type.
After defining your factory class, you need to call the RegisterFactory
macro. This macro instantiates a member of your factory class, so Vertica can interact with it and extract the metadata it contains about your UDSF.
Declaring Return Values
If your function returns a sized column (a return data type whose length can vary, such as a VARCHAR) or a value that requires precision, you must implement getReturnType()
. This method is called by Vertica to find the length or precision of the data being returned in each row of the results. The return value of this method depends on the data type your processBlock()
method returns:
- CHAR or VARCHAR return the maximum length of the string.
- NUMERIC types specify the precision and scale.
- TIME and TIMESTAMP values (with or without timezone) specify precision.
- INTERVAL YEAR TO MONTH specifies range.
- INTERVAL DAY TO SECOND specifies precision and range.
If your UDSF does not return one of these data types, it does not need a getReturnType()
method.
The input to the getReturnType()
method is a SizedColumnTypes
object that contains the input argument types along with their lengths. This object will be passed to an instance of your processBlock()
function. Your implementation of getReturnType()
must extract the data types and lengths from this input and determine the length or precision of the output rows. It then saves this information in another instance of the SizedColumnTypes
class.