The COPY statement offers extensive options and settings to control how to load data. However, you may find that these options do not suit the type of data load that you want to perform. The User-Defined Load (UDL) feature lets you develop one or more functions that change how the COPY statement operates. You can create custom libraries using the Vertica SDK to handle various steps in the loading process. .
You use three types of UDL functions during development, one for each stage of the data-load process:
- User-Defined Source (UDSource): C++ and Java - Controls how the COPY statement obtains the data it loads into the database. For example, COPY might obtain data by fetching it through HTTP or cURL. Up to one UDSource reads data from a file or input stream. Your UDSource can read from more than one source, but COPY invokes only one UDSource. The Vertica Python API does not support User-Defined Sources.
- User-Defined Filter (UDFilter): C++, Java, and Python - Preprocesses the data. For example, a filter might unzip a file or convert UTF-16 to UTF-8. You can chain multiple User-Defined Filters together, for example unzipping and then converting.
- User-Defined Parser (UDParser): C++, Java, and Python - Up to one parser parses the data into tuples that are ready to be inserted into a table. For example, a parser could extract data from an XML-like format. You can optionally define a User-Defined Chunker (UDChunker, C++ only), to have the parser perform parallel parsing.
After the final step, COPY inserts the data into a table, or rejects it, if it is not in the correct format.