Using Streaming Data with Vertica

To begin using streaming data and Vertica, use the vkconfig script to complete the following tasks from your Vertica database:

  1. Create a Cluster
  2. Create and Configure a Scheduler
  3. Create a Data Table
  4. Create a Source
  5. Create a Target
  6. Create a Load-Spec
  7. Create a Microbatch
  8. Launch the Scheduler
Note: You can view help for any of these steps using the command /opt/vertica/packages/kafka/bin/vkconfig source --help

Create a Cluster

You must associate at least one Kafka cluster with your schedler. Vertica supports connecting multiple clusters to the same Vertica database. When you create a cluster, you identify the name of that cluster and the hosts is contains.

/opt/vertica/packages/kafka/bin/vkconfig cluster --create --config-schema kafka_conversion --cluster kafka --hosts 1kafka01:9092,1kafka02:9092,1kafka03:9092

See Cluster Utility Options for more information.

Create and Configure Scheduler

Vertica includes a default stream_config scheduler. You can use this scheduler or create a new scheduler using the vkconfig script with the scheduler utility and --create option:

/opt/vertica/packages/kafka/bin/vkconfig scheduler --create

The --create option is all that is required to add a scheduler with default options.

You can use additional configuration parameters to further customize your scheduler.

The following example shows how you can use the commands to:

/opt/vertica/packages/kafka/bin/vkconfig scheduler --create --config-schema myScheduler --operator kafka_user --frame-duration '00:00:30'

See Scheduler Utility Options for more information.

Create a Data Table

Before configuring a source for streaming, create a target table in your Vertica database to store the data you capture. To load data into a flexible table, you must be using a flex parser. If you are not using flexible tables, you must verify that the data you are streaming matches the columns in your target table.

CREATE FLEX TABLE public.kafka_tgt();

You do not need to create a rejection table, which stores rejected messages, because the table is created automatically when you run the Source utility.

/opt/vertica/packages/kafka/bin/vkconfig cluster --create --cluster StreamCluster1 --hosts 10.10.10.10:9092,10.10.10.11:9092

Create a Source

To create and associate a source with a configured scheduler, use the source sub-utility.

The following example shows how you can create a source and associate it and three of its partitions to the default "stream_config" scheduler.

//opt/vertica/packages/kafka/bin/vkconfig source --create --config-schema stream_config --cluster StreamCluster1 --source conversion --partitions 3

See Source Utility Options for more information.

Create a Target

Once you have a source, configure a target table to receive data from that source. The following example identifies a target table of openx.conversion.

/opt/vertica/packages/kafka/bin/vkconfig target --create --config-schema kafka_conversion --target-schema openx --target-table conversion

See Target Utility Options for more information.

Create a Load-Spec

A load-spec provides parameters that Vertica uses when loading streaming data. It also identifies any filters that you want to apply to your data.

/opt/vertica/packages/kafka/bin/vkconfig load-spec --create --config-schema kafka_conversion --load-spec kafka

See Load Spec Utility Options for more information.

Create a Microbatch

A microbatch is a COPY that streams data to your target. The microbatch uses the parameters provided by your load-spec. The following example creates a new microbatch and assigns the source cluster kafka to it.

/opt/vertica/packages/kafka/bin/vkconfig microbatch --create --config-schema kafka_conversion --microbatch conversion --target-schema openx --target-table conversion --rejection-schema openx --rejection-table conversion_rej --load-spec kafka --add-source conversion --add-source-cluster kafka

See Microbatch Utility Options for more information.

Launch the Scheduler

After you create a table and associate a source, you are ready to launch the scheduler and start streaming data. Launch a configured scheduler by using the launch sub-utility.

The following example launches the default scheduler, stream_config, and specifies a properties file, configFile.properties, which contains additional CLI options. To start a different scheduler, use the --config-schema parameter option.

/opt/vertica/packages/kafka/bin/vkconfig launch --conf configFile.properties

Important: Open Text Corporation does not recommend specifying a password on the command line. Instead, put the password in a properties file.

See Launch Utility Options for more information.