ScalarFunction Class

The ScalarFunction class is the heart of a UDSF. Your subclass must define the processBlock() method to perform the scalar operation. It may define methods to set up and tear down the function.

For scalar functions written in C++, you can provide information that can help with query optimization. See Improving Query Performance (C++ Only).

Performing the Operation

The processBlock() method carries out all of the processing that you want your UDSF to perform. When a user calls your function in a SQL statement, Vertica bundles together the data from the function parameters and passes it to processBlock() .

The input and output of the processBlock() method are supplied by objects of the BlockReader and BlockWriter classes. They define methods that you use to read the input data and write the output data for your UDSF.

The majority of the work in developing a UDSF is writing processBlock(). This is where all of the processing in your function occurs. Your UDSF should follow this basic pattern:

  • Read in a set of arguments from the BlockReader object using data-type-specific methods.
  • Process the data in some manner.
  • Output the resulting value using one of the BlockWriter class's data-type-specific methods.
  • Advance to the next row of output and input by calling BlockWriter.next() and BlockReader.next().

This process continues until there are no more rows of data to be read (BlockReader.next() returns false).

You must make sure that processBlock() reads all of the rows in its input and outputs a single value for each row. Failure to do so can corrupt the data structures that Vertica reads to get the output of your UDSF. The only exception to this rule is if your processBlock() function reports an error back to Vertica (see Handling Errors). In that case, Vertica does not attempt to read the incomplete result set generated by the UDSF.

Setting Up and Tearing Down

The ScalarFunction class defines two additional methods that you can optionally implement to allocate and free resources: setup() and destroy(). You should use these methods to allocate and deallocate resources that you do not allocate through the UDx API (see Allocating Resources for UDxs for details).

Notes

  • While the name you choose for your ScalarFunction subclass does not have to match the name of the SQL function you will later assign to it, Vertica considers making the names the same a best practice.
  • Do not assume that your function will be called from the same thread that instantiated it.
  • The same instance of your ScalarFunction subclass can be called on to process multiple blocks of data.
  • The rows of input sent to processBlock() are not guaranteed to be any particular order.
  • Writing too many output rows can cause Vertica to emit an out-of-bounds error.

API

The ScalarFunction API provides the following methods for extension by subclasses:

virtual void setup(ServerInterface &srvInterface, 
		const SizedColumnTypes &argTypes);
				
virtual void processBlock(ServerInterface &srvInterface, 
		BlockReader &arg_reader, BlockWriter &res_writer)=0;

virtual void getOutputRange (ServerInterface &srvInterface, 
		ValueRangeReader &inRange, ValueRangeWriter &outRange)
				
virtual void cancel(ServerInterface &srvInterface);
				
virtual void destroy(ServerInterface &srvInterface, const SizedColumnTypes &argTypes);
		

The ScalarFunction API provides the following methods for extension by subclasses:

public void setup(ServerInterface srvInterface, SizedColumnTypes argTypes);

public abstract void processBlock(ServerInterface srvInterface, BlockReader arg_reader, 
		BlockWriter res_writer) throws UdfException, DestroyInvocation;
				
protected void cancel(ServerInterface srvInterface);
				
public void destroy(ServerInterface srvInterface, SizedColumnTypes argTypes);
		

The ScalarFunction API provides the following methods for extension by subclasses:

def setup(self, server_interface, col_types)

def processBlock(self, server_interface, block_reader, block_writer)

def destroy(self, server_interface, col_types)

Implement the Main Function API to define a scalar function:

FunctionName <- function(input.data.frame, parameters.data.frame) {
  # Computations

  # The function must return a data frame.
  return(output.data.frame)
}