ParserFactory Class

If you write a parser, you must also write a factory to produce parser instances. To do so, subclass the ParserFactory class.

Parser factories are singletons. Your subclass must be stateless, with no fields containing data. Your subclass also must not modify any global variables.

The ParserFactory class defines the following methods. Your subclass must override the prepare() method. It may override the other methods.

Setting Up

Vertica calls plan() once on the initiator node to perform the following tasks:

  • Check any parameters that have been passed from the function call in the COPY statement and error messages if there are any issues. You read the parameters by getting a ParamReader object from the instance of ServerInterface passed into your plan() method.
  • Store any information that the individual hosts need in order to parse the data. For example, you could store parameters in the PlanContext instance passed in through the planCtxt parameter. The plan() method runs only on the initiator node, and the prepareUDSources() method runs on each host reading from a data source. Therefore, this object is the only means of communication between them.

    You store data in the PlanContext by getting a ParamWriter object from the getWriter() method. You then write parameters by calling methods on the ParamWriter such as setString.

    ParamWriter offers only the ability to store simple data types. For complex types, you need to serialize the data in some manner and store it as a string or long string.

Creating Parsers

Vertica calls prepare() on each node to create and initialize your parser, using data stored by the plan() method.

Defining Parameters

Implement getParameterTypes() to define the names and types of parameters that your parser uses. Vertica uses this information to warn callers about unknown or missing parameters. Vertica ignores unknown parameters and uses default values for missing parameters. While you should define the types and parameters for your function, you are not required to override this method.

Defining Parser Outputs

Implement getParserReturnType() to define the data types  of the table columns that the parser outputs. If applicable, getParserReturnType() also defines the size, precision, or scale of the data types. Usually, this method reads data types of the output table from the argType and perColumnParamReader arguments and verifies that it can output the appropriate data types. If getParserReturnType() is prepared to output the data types, it calls methods on the SizedColumnTypes object passed in the returnType argument. In addition to the data type of the output column, your method should also specify any additional information about the column's data type:

  • For binary and string data types (such as CHAR, VARCHAR, and LONG VARBINARY), specify its maximum length.
  • For NUMERIC types, specify its precision and scale.
  • For Time/Timestamp types (with or without time zone), specify its precision (-1 means unspecified).
  • For all other types, no length or precision specification is required.

Supporting Cooperative Parse

To support Cooperative Parse, implement prepareChunker() and return an instance of your UDChunker subclass. If isChunkerApportionable() returns true, then it is an error for this method to return null.

Cooperative parse is currently supported only in the C++ API.

Supporting Apportioned Load

To support Apportioned Load, your parser, chunker, or both must support apportioning. To indicate that the parser can apportion a load, implement isParserApportionable() and return true. To indicate that the chunker can apportion a load, implement isChunkerApportionable() and return true.

The isChunkerApportionable() method takes a ServerInterface as an argument, so you have access to the parameters supplied in the COPY statement. You might need this information if the user can specify a record delimiter, for example. Return true from this method if and only if the factory can create a chunker for this input.