Vertica and Apache Kafka

Currently, the only data streaming platform that Vertica supports is Apache Kafka. Kafka is an open-source distributed real-time streaming platform. See Apache's main Kafka page for more information. By integrating Kafka and Vertica, you can load data from any application that produces Kafka messages.

The integration features between Vertica and Kafka consist of:

  • A UDx library containing functions that load and parse data from Kafka topics into Vertica.
  • A job scheduler that uses the UDL library to continuously consume data from your message bus with exactly-once semantics
  • Push-basedMonitoring Vertica Using Notifiers that send data collector messages from Vertica to Kafka
  • A KafkaExport function that sends Vertica data to Kafka.

Consuming Data from Kafka

There are two ways to load data from Kafka:

  • Manually, by directly executing a COPY statement. You use this method to load a finite amount of data. Some reasons you may want to manually load data:

    • Managing a streaming data load with greater control than using a scheduler. Schedulers can manage data loads in many cases. However, you may find you need greater control over the data load than is available through the scheduler.
    • Loading specific chunks of data you want to analyze, rather than constantly streaming data. For example, you may want to load web server logs from a specific time period to perform in-depth analytics.
    • Manually testing your Kafka and Vertica configuration before creating a streaming data load. See Manually Copying Data From Kafka for more information.
  • Automatically via job schedulers. The schedulers constantly load data from Kafka and ensure each Kafka message is loaded exactly once. See Automatically Copying Data From Kafka for more information.

Producing Data for Kafka

Vertica can send data to Kafka for processing by other consumers of Kafka's data streams. There are two ways you can send data to Kafka:

  • You use notifiers to send Vertica health and performance data stored in the Data Collector tables. This feature is useful to stream data to third-party monitoring tools.
  • You use the KafkaExport function to export Vertica data to Kafka. See Producing Data Using KafkaExport for more information.