Why is Vertica not Ingesting Data From Kafka?

Posted June 14, 2018 by Soniya Shah, Information Developer

Prerequisite: Verify that Vertica is up and running. If you want to troubleshoot why Vertica is not ingesting data from Kafka, follow this checklist.
Step Task Results
1 Check whether Kafka is up and running. a.
    • Examine the server log files for broker errors:
$ BROKER_ROOT/logs/server.log
    If there are errors, consult the Kafka documentation.
b.
    • Examine the zookeeper log files for zookeeper errors:
$ ZOOKEEPER_ROOT/logs/zookeeper.log
    If there are errors, consult the Kafka documentation.
c.
    • Restart Kafka.
$ KAFKA_INSTALL/Kafka_server_stop.sh $ KAFKA_INSTALL/Kafka_server_start.sh $ KAFKA_INSTALL/zookeeper_server_stop.sh $ KAFKA_INSTALL/zookeeper_server_start.shd.
    • Perform a sample produce and consume.
    On a broker node, use Kafka CLI producer and consumer to demonstrate that Kafka is working as expected.
If Kafka is up and running, go to Step 2. If Kafka is not up and running, start Kafka using zookeeper_start or kafka_server_start. 
  • If Vertica starts ingesting data from Kafka, the checklist is complete.
  • If Vertica still does not ingest data from Kafka, go to Step 2.
2 Check whether Kafka produce and consume works from the Vertica nodes. To confirm the Kafka connection settings are correct, run kafkacat produce and consume from each Vertica node, using exactly the same broker list that you used to configure the scheduler. The command checks whether the network and firewall work without causing issues. If Kafka produce and consume works with Vertica nodes, go to Step 3. If Kafka produce and consume does not work with Vertica nodes, check your network configuration and consult the Kafka documentation.
3 Perform an ad-hoc copy into a dummy table using the kafkaparser. => CREATE TABLE test (x varchar(65000)); => COPY test source KafkaSource(brokers='broker:9092,broker2:9092'duration=interval'10second') Parser KafkaParser() rejected data as table test_rejects;=> SELECT* FROM test; The command shows raw message data. => SELECT* FROM test_rejects; The command returns no value or character. => SELECT KafkaOffsets() over partition (best) FROM dual; The command returns load statistics for each partition that you attempted to load from. If Vertica can load raw data from Kafka, o to Step 4. If KafkaSource does not use the same configuration as kafkacat, the KafkaSource is the issue instead of the parser. Please contact Vertica Technical Support.
4 Check whether the parser is configured correctly. a.
    Perform an ad-hoc copy using a real parser.
b.
    Use a rejected data table to store any rejected table.
For JSON, use the KafkaJSONParser. For AVRO, use the KafkaAvroParser.
If the copy acts as expected, there are no data formatting problems, go to Step 5. If the query errored out or rejected data, there is a data formatting problem. Please contact Vertica Technical Support.
5 Check whether the scheduler is configured and running properly. The scheduler requires a few inputs that are common to each subcommand. The scheduler stores the configuration information inside a Vertica database. The scheduler uses Vertica schema to store configuration and runtime state.You can check the stream_lock configuration table and the stream_scheduler_history table to see if the scheduler is running. The scheduler should know the following:
  • JDBC connection string
  • Configuration schema
Vertica schema that the scheduler uses to store configuration and runtime state.
If the scheduler is running properly, go to Step 6. If the scheduler is not running, launch with vkconfig.
  • If Vertica starts ingesting data from Kafka, the checklist is complete.
  • If Vertica still does not ingest data from Kafka, go to Step 6.
6 Check whether the scheduler is making progress. a.
    • View the dc_requests_issued table to see the load commands. Query this table to view the COPY statements that the scheduler has launched.
    If you do not see any progress, there could be an issue with the COPY statement.
b.
    View the kafka_events table to look for errors. This table contains an in-database log file.
c.
    • View the kafka_offsets table to see the full batch history. If data is continuously loaded, then the scheduler is running.
    If no messages are loaded, the scheduler is not working as expected.
If the scheduler is making progress, the checklist is complete. If the scheduler is not making progress, please contact Vertica Technical Support.

Learn More

Learn more about Apache Kafka in the Vertica Documentation. Also, see Apache Kafka documentation.