Configuring Kafka for Vertica

The following are settings that are used to optimize Kafka performance with Vertica. Some settings, such as message.max.bytes, are configurable on multiple Kafka components.

For detailed information on Apache Kafka configuration settings, refer to the Apache Kafka rdkafka documentation.

Producer Settings

Kafka producers are the processes that publish messages to Kafka brokers.

Setting Affects Notes
queue.buffering.max.messages Latency Specifies the size of the Vertica producer queue. If Vertica generates too many messages too quickly, the queue can fill, resulting in dropped messages. Increasing this value consumes more memory, but reduces the chance of lost messages.
queue.buffering.max.ms Latency Specifies the frequency with which Vertica flushes the producer message queue. Lower values decrease latency at the cost of throughput. Higher values increase throughput, but can cause the producer queue (set by queue.buffering.max.messages) to fill more frequently, resulting in dropped messages.
message.max.bytes Reliability Specifies the maximum size of a Kafka message. This size is the size of the JSON serialized message. To prevent truncated messages, set this value to the size of the largest possible message. This values should be the same on your sources, brokers, and producers.
message.send.max.retries Reliability Specifies the number of attempts the producer makes to deliver the message to a broker. Higher values increase the chance of success.
retry.backoff.ms Reliability Specifies the interval Vertica waits before resending a failed message.
request.required.acks Reliability Specifies how many broker replica acknowledgments Kafka requires before it considers message delivery successful. Requiring acknowledgments increases latency. Removing acknowledgments increases the risk of message loss.
request.timeout.ms Reliability Specifies the interval that the producer waits for a response from the broker. Broker response time is affected by server load and the number of message acknowledgments you require.
Higher values increase latency.

Broker Settings

Kafka brokers receive messages from producers and distribute them among Kafka consumers. Configure these settings on the brokers themselves. These settings function independently of your producer and consumer settings.

Setting Affects Notes
message.max.bytes Reliability Specifies the maximum size of a Kafka message. This size is the size of the JSON serialized message. To prevent truncated messages, set this value to the size of the largest possible message. This values should be the same on your sources, brokers, and producers.

num.network.threads

Performance Specifies the number of network threads the broker uses to accept network requests. More threads can increase your concurrency.
num.io.threads Performance Specifies the number of network threads the broker uses to receive and process requests. More threads can increase your concurrency.

Consumer Settings

Kafka consumers process the stream of messages distributed by the brokers. You can set these values using the kafka_conf parameter on the KafkaSource UDL or using the --parser-parameters setting in the Load Spec Utility Options.

Setting Affects Notes
message.max.bytes Reliability Specifies the maximum size of a Kafka message. This size is the size of the JSON serialized message. To prevent truncated messages, set this value to the size of the largest possible message. This values should be the same on your sources, brokers, and producers.