Scheduler Tool Options

The vkconfig script's scheduler tool lets you configure schedulers that continuously loads data from Kafka into Vertica. Use the scheduler tool to create, update, or delete a scheduler, defined by config-schema. If you do not specify a scheduler, commands apply to the default stream_config scheduler.

Syntax

vkconfig scheduler {--create | --read | --update | --drop} other_options...
Option Description
--create

Creates a new scheduler. Cannot be used with --delete, --read, or --update.

--read

Outputs the current setting of the scheduler in JSON format. Cannot be used with --create, --delete, or --update.

--update

Updates an existing scheduler. Cannot be used with --create, --delete, or --read.

--drop Drops the scheduler's schema. Dropping its schema deletes the scheduler. After you drop the scheduler's schema, you cannot recover it.
--add-operator user_name Grants a Vertica user account or role access to use and alter the scheduler. Requires the --update shared utility option.
--auto-sync {TRUE|FALSE}

When TRUE, Vertica automatically synchronizes scheduler source information at the interval specified in --config-refresh.

Default Value:

TRUE

For more information on synchronization, refer to Automatically Copying Data From Kafka.

--config-refresh HH:MM:SS

The interval of time that the scheduler runs before synchronizing its settings and updating its cached metadata (such as changes made by using the --update option).

Default Value:

00:05:00

--consumer-group-id id_name

The name of the Kafka consumer group to which Vertica reports its progress consuming messages. By default, Vertica reports its progress to a group named vertica_database-name. See Monitoring Vertica Message Consumption with Consumer Groups for more information.

Set this value to an empty string ('') to disable progress reports to a Kafka consumer group.

--dump

When you use this option along with the --read option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read.

--eof-timeout-ms number of milliseconds

If a COPY command does not receive any messages within the eof-timeout-ms interval, Vertica responds by ending that COPY statement.

Default Value:

1 second

See Manually Copying Data From Kafka for more information.

--fix-config Repairs the configuration and re-creates any missing tables. Valid only with the --update shared configuration option.
--frame-duration HH:MM:SS

The interval of time that all individual frames last with this scheduler. Vertica must have enough time to complete COPY tasks within this duration. You can approximate the average available time per COPY using the following equation:

TimePerCopy=(FrameDuration*Parallelism)/Microbatches

This is just a rough estimate as there are many factors that impact the amount of time that each COPY statement will be able to run.

Vertica requires at least 100 milliseconds per COPY to function. You can increase the available time per COPY by increasing your frame duration.

Default Value:

00:00:10

--message_max_bytes max_message_size

Specifies the maximum size, in bytes, of a Kafka protocol batch message.

Default Value:

25165824

You may need to manually update this value if you created a scheduler using Vertica 9.1.0 or earlier. The meaning of Kafka's max.message.bytes setting changed between version 0.10 and 0.11. See Changes to the message.max.bytes Setting in Kafka Version 0.11 and Later for more information.

--new-source-policy {FAIR|START|END}

Determines how Vertica allocates resources to the newly added source.

Valid Values:

  • FAIR: Takes the average length of time from the previous batches and schedules itself appropriately.
  • START: All new sources start at the beginning of the frame. The batch receives the minimal amount of time to run.
  • END: All new sources start at the end of the frame. The batch receives the maximum amount of time to run.

Default Value:

FAIR

--operator username

Allows the dbadmin to grant privileges to a previously created Vertica user or role.

This option gives the specified user all privileges on the scheduler instance and EXECUTE privileges on the libkafka library and all its UDxs.

Granting operator privileges gives the user the right to read data off any source in any cluster that can be reached from the Vertica node.

The dbadmin must grant the user separate permission for them to have write privileges on the target tables.

Requires the --create shared utility option. Use the --add-operator option to grant operate privileges after the scheduler has been created.

To revoke privileges, use the --remove option with the --operator option.

--remove-operator user_name Removes access to the scheduler from a Vertica user account. Requires the --update shared utility option.
--resource-pool pool_name

The resource pool to be used by all queries executed by this scheduler. You must create this pool in advance if you are not using the default pool.

Default Value:

stream_default_pool

--upgrade Upgrades the existing scheduler and configuration schema to the current Vertica version. The upgraded version of the scheduler is not backwards compatible with earlier versions. To upgrade a scheduler to an alternate schema, use the upgrade-to-schema parameter. See Updating Schedulers After Vertica Upgrades for more information.
--upgrade-to-schema schema name

Copies the scheduler's schema to a new schema specified by schema name and then upgrades it to be compatible with the current version of Vertica. Vertica does not alter the old schema. Requires the --upgrade scheduler utility option.

--validation-type {ERROR|WARN|SKIP}

Specifies the level of validation performed on the scheduler. Invalid SQL syntax and other errors can cause invalid microbatches. Vertica supports the following validation types:

  • ERROR - Cancel configuration or creation if validation fails. If you do not specify a validation type, this value is the default.
  • WARN - Proceed with task if validation fails, but display a warning.
  • SKIP - Perform no validation.

For more information on validation, refer to Automatically Copying Data From Kafka.

Renamed from --skip-validation.

See Common vkconfig Script Options for options that are available in all of the vkconfig tools.

Examples

These examples show how you can use the scheduler utility options.

Give a user, Jim, privileges on the StreamConfig scheduler. Specify that you are making edits to the stream_config scheduler with the --config-schema option:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --config-schema stream_config --add-operator Jim

Edit the default stream_config scheduler so that every microbatch waits for data for one second before ending:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --eof-timeout-ms 1000

Upgrade the scheduler named iot_scheduler_8.1 to a new scheduler named iot_scheduler_9.0 that is compatible with the current version of Vertica:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --upgrade --config-schema iot_scheduler_8.1 \
                                           --upgrade-to-schema iot_scheduler_9.0

Drop the schema scheduler219a:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --drop --config-schema  scheduler219a --username dbadmin

Read the current setting of the options you can set using the scheduler tool for the scheduler defined in weblogs.conf.

$ vkconfig scheduler --read --conf weblog.conf
{"version":"v9.2.0", "frame_duration":"00:00:10", "resource_pool":"weblog_pool",
"config_refresh":"00:05:00", "new_source_policy":"FAIR", 
"pushback_policy":"LINEAR", "pushback_max_count":5, "auto_sync":true, 
"consumer_group_id":null}