Using Streaming Data with Vertica
To begin using streaming data and Vertica, use the vkconfig script to complete the following tasks from your Vertica database:
- Create a Cluster
- Create and Configure a Scheduler
- Create a Data Table
- Create a Source
- Create a Target
- Create a Load-Spec
- Create a Microbatch
- Launch the Scheduler
/opt/vertica/packages/kafka/bin/vkconfig source --help
Create a Cluster
You must associate at least one Kafka cluster with your schedler. Vertica supports connecting multiple clusters to the same Vertica database. When you create a cluster, you identify the name of that cluster and the hosts is contains.
/opt/vertica/packages/kafka/bin/vkconfig cluster --create --config-schema
kafka_conversion --cluster kafka --hosts 1kafka01:9092,1kafka02:9092,1kafka03:9092
See Cluster Utility Options for more information.
Create and Configure Scheduler
Vertica includes a default stream_config scheduler. You can use this scheduler or create a new scheduler using the vkconfig script with the scheduler
utility and --create
option:
/opt/vertica/packages/kafka/bin/vkconfig scheduler --create
The --create
option is all that is required to add a scheduler with default options.
You can use additional configuration parameters to further customize your scheduler.
The following example shows how you can use the commands to:
- Create a scheduler called "myScheduler" with the --
config-schema
option. - Grant privileges to run the scheduler to kafka_user with the
--operator
option. The dbadmin user must specify kafka's additional privileges separately. - Specify a frame duration of thirty seconds with the
--frame-duration
option.
/opt/vertica/packages/kafka/bin/vkconfig scheduler --create --config-schema myScheduler --operator kafka_user --frame-duration '00:00:30'
See Scheduler Utility Options for more information.
Create a Data Table
Before configuring a source for streaming, create a target table in your Vertica database to store the data you capture. To load data into a flexible table, you must be using a flex parser. If you are not using flexible tables, you must verify that the data you are streaming matches the columns in your target table.
CREATE FLEX TABLE public.kafka_tgt();
You do not need to create a rejection table, which stores rejected messages, because the table is created automatically when you run the Source utility.
/opt/vertica/packages/kafka/bin/vkconfig cluster --create --cluster StreamCluster1 --hosts 10.10.10.10:9092,10.10.10.11:9092
Create a Source
To create and associate a source with a configured scheduler, use the source
sub-utility.
The following example shows how you can create a source and associate it and three of its partitions to the default "stream_config" scheduler.
//opt/vertica/packages/kafka/bin/vkconfig source --create --config-schema stream_config --cluster StreamCluster1 --source conversion --partitions 3
See Source Utility Options for more information.
Create a Target
Once you have a source, configure a target table to receive data from that source. The following example identifies a target table of openx.conversion.
/opt/vertica/packages/kafka/bin/vkconfig target --create --config-schema kafka_conversion --target-schema openx --target-table conversion
See Target Utility Options for more information.
Create a Load-Spec
A load-spec provides parameters that Vertica uses when loading streaming data. It also identifies any filters that you want to apply to your data.
/opt/vertica/packages/kafka/bin/vkconfig load-spec --create --config-schema kafka_conversion --load-spec kafka
See Load Spec Utility Options for more information.
Create a Microbatch
A microbatch is a COPY that streams data to your target. The microbatch uses the parameters provided by your load-spec. The following example creates a new microbatch and assigns the source cluster kafka to it.
/opt/vertica/packages/kafka/bin/vkconfig microbatch --create --config-schema kafka_conversion --microbatch conversion --target-schema openx --target-table conversion --rejection-schema openx --rejection-table conversion_rej --load-spec kafka --add-source conversion --add-source-cluster kafka
See Microbatch Utility Options for more information.
Launch the Scheduler
After you create a table and associate a source, you are ready to launch the scheduler and start streaming data. Launch a configured scheduler by using the launch
sub-utility.
The following example launches the default scheduler, stream_config, and specifies a properties file, configFile.properties, which contains additional CLI options. To start a different scheduler, use the --config-schema
parameter option.
/opt/vertica/packages/kafka/bin/vkconfig launch --conf configFile.properties
Important: Open Text Corporation does not recommend specifying a password on the command line. Instead, put the password in a properties file.
See Launch Utility Options for more information.