Parsing Custom Formats

Vertica supports the use of user-defined filters to manipulate data arriving from your streaming message bus. You can apply these filters to data before you parse it. By default, data that flows from the source does not contain message boundaries. The default Kafka parsers can parse this format. However, other user-defined and Vertica parsers require additional message processing. Filters provide the ability manipulate data using user-defined parsers.

Filters for Use with Kafka Data

Vertica includes the following filters:

Note: The Vertica provided filters are mutually exclusive. You cannot use both to process a Kafka data stream.

Vertica also supports the use of additional Vertica and user-defined filters. If you are using a Vertica filter, it must appear first in the filter list. Use a comma to delimit multiple filters. If you are using a non-Kafka parser, you must use at least one filter to prepare your content for that parser. If you do not provide a filter, the parser fails with the message:

Input is not from Kafka source.

Examples

The following example shows how you can delimit data streams from two hosts by the string \n. You can then use a CSV parser to parse the content.

=> COPY stream_data.target_table SOURCE KafkaSource (stream='source1|1|1,source2|2|2', brokers='host1:9092,host2:9092',
                                                       duration= INTERVAL'timeslice') 
    FILTER KafkaInsertDelimiters(delimiter = '\n') 
       PARSER MyCsvParser(recordTerminator = '\n');

The following example shows how you can specify that a Vertica filter and a decryption filter process a single Kafka data stream. Using the length information the KafkaInsertLengths filter injects, the parser can identify each record and parse it individually.

=> COPY stream_data.target_table SOURCE KafkaSource (stream='source1|1|1, brokers='host1:9092') 
    FILTER KafkaInsertLengths() DecryptFilter(parameter=Key) 
       PARSER ComplexParser(parameter = 'value');