UDSF Class Overview

You create your UDSF by subclassing two classes defined by the Vertica SDK: ScalarFunction and ScalarFunctionFactory.

ScalarFunction

The ScalarFunction class is the heart of a UDSF. Your subclass must define the processBlock() method to perform the scalar operation. It may define methods to set up and tear down the function.

Performing the Operation

The processBlock() method carries out all of the processing that you want your UDSF to perform. When a user calls your function in a SQL statement, Vertica bundles together the data from the function parameters and passes it to processBlock() .

The input and output of the processBlock() method are supplied by objects of the BlockReader and BlockWriter classes. They define methods that you use to read the input data and write the output data for your UDSF.

The majority of the work in developing a UDSF is writing processBlock(). This is where all of the processing in your function occurs. Your UDSF should follow this basic pattern:

  • Read in a set of parameters from the BlockReader object using data-type-specific methods.
  • Process the data in some manner.
  • Output the resulting value using one of the BlockWriter class's data-type-specific methods.
  • Advance to the next row of output and input by calling BlockWriter.next() and BlockReader.next().

This process continues until there are no more rows of data to be read (BlockReader.next() returns false).

You must make sure that processBlock() reads all of the rows in its input and outputs a single value for each row. Failure to do so can corrupt the data structures that Vertica reads to get the output of your UDSF. The only exception to this rule is if your processBlock() function reports an error back to Vertica (see Handling Errors). In that case, Vertica does not attempt to read the incomplete result set generated by the UDSF.

Setting Up and Tearing Down

The ScalarFunction class defines two additional methods that you can optionally implement to allocate and free resources: setup() and destroy(). You should use these methods to allocate and deallocate resources that you do not allocate through the UDx API (see Allocating Resources for UDxs for details).

Notes

  • While the name you choose for your ScalarFunction subclass does not have to match the name of the SQL function you will later assign to it, Vertica considers making the names the same a best practice.
  • Do not assume that your function will be called from the same thread that instantiated it.
  • The same instance of your ScalarFunction subclass can be called on to process multiple blocks of data.
  • The rows of input sent to processBlock() are not guaranteed to be any particular order.
  • Writing too many output rows can cause Vertica to emit an out-of-bounds error.

ScalarFunctionFactory

The ScalarFunctionFactory class tells Vertica metadata about your UDSF: its number of parameters and their data types, as well as the data type of its return value. It also instantiates a subclass of ScalarFunction.

Methods

You must implement the following methods in your ScalarFunctionFactory subclass:

  • createScalarFunction() instantiates a ScalarFunction subclass. If writing in C++, you can call the vt_createFuncObj macro with the name of the ScalarFunction subclass. This macro takes care of allocating and instantiating the class for you.
  • getPrototype() tells Vertica about the parameters and return type(s) for your UDSF. In addition to a ServerInterface object, this method gets two ColumnTypes objects. All you need to do in this function is to call class functions on these two objects to build the list of parameters and the return value type(s). If you return more than one value, the results are packaged into a ROW type.

After defining your factory class, you need to call the RegisterFactory macro. This macro instantiates a member of your factory class, so Vertica can interact with it and extract the metadata it contains about your UDSF.

Declaring Return Values

If your function returns a sized column (a return data type whose length can vary, such as a VARCHAR), a value that requires precision, or more than one value, you must implement getReturnType(). This method is called by Vertica to find the length or precision of the data being returned in each row of the results. The return value of this method depends on the data type your processBlock() method returns:

  • CHAR or VARCHAR return the maximum length of the string.
  • NUMERIC types specify the precision and scale.
  • TIME and TIMESTAMP values (with or without timezone) specify precision.
  • INTERVAL YEAR TO MONTH specifies range.
  • INTERVAL DAY TO SECOND specifies precision and range.

If your UDSF does not return one of these data types and returns a single value, it does not need a getReturnType() method.

The input to the getReturnType() method is a SizedColumnTypes object that contains the input argument types along with their lengths. This object will be passed to an instance of your processBlock() function. Your implementation of getReturnType() must extract the data types and lengths from this input and determine the length or precision of the output rows. It then saves this information in another instance of the SizedColumnTypes class.