User-Defined Parser

A parser takes a stream of bytes and passes a corresponding sequence of tuples to the Vertica load process. You can use User-Defined Parser functions to parse:

  • Data in formats not understood by the Vertica built-in parser.
  • Data that requires more specific control than the built-in parser supplies.

For example, you could load a CSV file using a specific CSV library. See the Vertica SDK for two CSV examples.

COPY supports a single User-Defined Parser that you can use with a UDSource and zero or more instances of UDFilter.

Sometimes you can improve the performance of your parser by adding a chunker. A chunker divides up the input and uses multiple threads to parse it. See Cooperative Parse. Chunkers are available only in the C++ API.

Under special circumstances you can further improve performance by using apportioned load, an approach where multiple Vertica nodes parse the input. See Apportioned Load.

If you implement a UDParser, you must also implement a corresponding ParserFactory.

See UDParser ClassParserFactory Class, and UDChunker Class for API details.