By default, Vertica parses a data source in a single thread on one database node. You can optionally use cooperative parse to parse a source using multiple threads on a node. More specifically, data from a source passes through a chunker that groups blocks from the source stream into logical units. These chunks can be parsed in parallel. The chunker divides the input into pieces that can be individually parsed, and the parser then parses them concurrently. Cooperative parse is available only for unfenced UDxs. (See Fenced and Unfenced Modes.)
To use cooperative parse, a chunker must be able to locate end-of-record markers in the input. Locating these markers might not be possible in all input formats.
Chunkers are created by parser factories. At load time, Vertica first calls the
UDChunker to divide the input into chunks and then calls the
UDParser to parse each chunk.
You can use cooperative parse and apportioned load independently or together. See Combining Cooperative Parse and Apportioned Load.
When Vertica receives data from a source, it calls the chunker's
process() method repeatedly. A chunker is, essentially, a lightweight parser; instead of parsing, the
process() method divides the input into chunks.
After the chunker has finished dividing the input into chunks, Vertica sends those chunks to as many parsers as are available, calling the
process() method on the parser.
Implementing Cooperative Parse
To implement cooperative parse, perform the following actions:
- In your
prepareChunker()to return a
See C++ Example: Delimited Parser and Chunker for a
UDChunker that also supports apportioned load.