Apache Kafka Integration

New Supported Versions of Kafka

The Vertica integration for Apache Kafka supports the Kafka versions supported by the Apache organization at the time this version of Vertica was developed. For Vertica Version 9.1.1, the supported versions of Kafka are 0.11, 1.0, and 1.1.

Vertica can work with older versions of Kafka. However, version 0.9 and earlier use an older revision of the Kafka protocol. To connect Vertica 9.1.1 or later to a Kafka cluster running 0.9 or earlier, you must change a setting in the rdkafka library that Vertica uses to communicate with Kafka. See Configuring Vertica for Apache Kafka Version 0.9 and Earlier.

The meaning of Kafka's message.max.bytes setting changed between Kafka version 0.10 and 0.11. This change could cause performance issues when loading data using a streaming job scheduler that was created using Vertica version 9.1.0 or earlier. See Changes to the message.max.bytes Setting in Kafka Version 0.11 and Later for more information.

Support for Kafka Consumer Groups

Vertica now supports the Kafka consumer group feature. With Kafka, you usually use this feature to balance the load of reading messages across consumers and ensure consumers read messages no more than once. The Vertica streaming job scheduler prevents re-reading messages by managing message offsets on its own, and manages spreading the load acorss the entire Vertica cluster. The main use case for consumer groups with Vertica is to allow third-party applications to monitor its progress as it consumes messages. See Monitoring Vertica Message Consumption with Consumer Groups for more information.

Directly Changing Kafka Library Settings

This version adds the ability to pass options directly to the rdkafka library that Vertica uses to communicate with Kafka. This feature lets you tweak settings you cannot directly set from within Vertica. You may need to change these settings in unusual cases. See Directly Setting Kafka Library Options for more information.

KafkaSource eof_timeout Setting Deprecated

In previous versions of Vertica, the KafkaSource function used the eof_timeout parameter to determine when it had reached the end of a stream. If no data arrived from Kafka in the period of time set in this parameter, the KafkaSource function would decide the stream had ended and would end the COPY statement that called it. In version 9.1.1, the underlying library that Vertica uses to communicate with Kafka directly indicates when the stream has ended without having wait for a timeout period to elapse. Therefore, the use of this parameter is deprecated. It will be removed in a future version.

Miscellaneous Changes

In prior versions of Vertica, the Apache Kafka integration feature sent log messages to the dbLog file (the same log file where UDx libraries send log messages). Starting in version 9.1.1, the Apache Kafka integration logs messages to the standard vertica.log file. See Monitoring Log Files for more information about viewing log messages.

KafkaAvroParser and KafkaJSONParser now natively support the UUID data type. Previously, they these parsers would treat UUID values as VARCHAR. They are now able to parse UUID values into the Vertica UUID data type.