ParserFactory Class
If you write a parser, you must also write a factory to produce parser instances. To do so, subclass the ParserFactory
class.
Parser factories are singletons. Your subclass must be stateless, with no fields containing data. Your subclass also must not modify any global variables.
The ParserFactory
class defines the following methods. Your subclass must override the prepare()
method. It may override the other methods.
Setting Up
Vertica calls plan()
once on the initiator node to perform the following tasks:
- Check any parameters that have been passed from the function call in the COPY statement and error messages if there are any issues. You read the parameters by getting a
ParamReader
object from the instance ofServerInterface
passed into yourplan()
method. -
Store any information that the individual hosts need in order to parse the data. For example, you could store parameters in the
PlanContext
instance passed in through theplanCtxt
parameter. Theplan()
method runs only on the initiator node, and theprepareUDSources()
method runs on each host reading from a data source. Therefore, this object is the only means of communication between them.You store data in the
PlanContext
by getting aParamWriter
object from thegetWriter()
method. You then write parameters by calling methods on theParamWriter
such assetString
.ParamWriter
offers only the ability to store simple data types. For complex types, you need to serialize the data in some manner and store it as a string or long string.
Creating Parsers
Vertica calls prepare()
on each node to create and initialize your parser, using data stored by the plan()
method.
Defining Parameters
Implement getParameterTypes()
to define the names and types of parameters that your parser uses. Vertica uses this information to warn callers about unknown or missing parameters. Vertica ignores unknown parameters and uses default values for missing parameters. While you should define the types and parameters for your function, you are not required to override this method.
Defining Parser Outputs
Implement getParserReturnType()
to define the data types of the table columns that the parser outputs. If applicable, getParserReturnType()
also defines the size, precision, or scale of the data types. Usually, this method reads data types of the output table from the argType
and perColumnParamReader
arguments and verifies that it can output the appropriate data types. If getParserReturnType()
is prepared to output the data types, it calls methods on the SizedColumnTypes
object passed in the returnType
argument. In addition to the data type of the output column, your method should also specify any additional information about the column's data type:
- For binary and string data types (such as CHAR, VARCHAR, and LONG VARBINARY), specify its maximum length.
- For NUMERIC types, specify its precision and scale.
- For Time/Timestamp types (with or without time zone), specify its precision (-1 means unspecified).
- For all other types, no length or precision specification is required.
Supporting Cooperative Parse
To support Cooperative Parse, implement prepareChunker()
and return an instance of your UDChunker
subclass. If isChunkerApportionable()
returns true
, then it is an error for this method to return null.
Cooperative parse is currently supported only in the C++ API.
Supporting Apportioned Load
To support Apportioned Load, your parser, chunker, or both must support apportioning. To indicate that the parser can apportion a load, implement isParserApportionable()
and return true
. To indicate that the chunker can apportion a load, implement isChunkerApportionable()
and return true
.
The isChunkerApportionable()
method takes a ServerInterface
as an argument, so you have access to the parameters supplied in the COPY statement. You might need this information if the user can specify a record delimiter, for example. Return true
from this method if and only if the factory can create a chunker for this input.