Kafka TLS-SSL Example Part 5: Configure the Scheduler

The final piece of the configuration is to set up the scheduler to use SSL when communicating with Kafka (and optionally with Vertica). When the scheduler runs a COPY command to get data from Kafka, it uses its own key and certificate to authenticate with Kafka. If you choose to have the scheduler use TLS/SSL to connect to Vertica, it can re-use the same keystore and truststore to make this connection.

Step 1: Create the Scheduler Keystore

Because the scheduler is a separate component, it must have its own SSL key and certificate. The scheduler runs under Java, and uses the JDBC interface to connect to Vertica. Therefore, you must create a keystore and truststore for it to use when making an SSL-encrypted connection to Vertica.

This process is similar to creating the keystores in the previous parts of this example. The main change in these steps is using the keytool command's -dname option to set the Common Name (CN) for the key to a domain wildcard. Using this setting allows the key and certificate to match any host in the network. This ability is especially useful if you run multiple schedulers on different servers to provide redundancy. The schedulers can use the same key and certificate, no matter which server they are running on in your domain.

The steps to create the scheduler's keystore are:

  1. Initialize the keystore, passing it a wildcard host name as the Common Name. The alias parameter in this command is important, as you use it later to identify the key the scheduler must use when creating SSL conections:

    keytool -keystore scheduler.keystore.jks -alias vsched -validity 365 -genkey \
            -keyalg RSA  -dname CN=*.mycompany.com

    If you choose to use a file format other than the standard Java Keystore (JKS) format for your keystore or truststore files, you must use the correct file extension in the filename. For example, suppose you choose to use a keystore and truststore saved in PKCS#12 format. Then your keystore and trustore files must end with the .pfx or .p12 extension.
    If the scheduler does not recognize the file's extension (or there is no extension in the file name), it assumes that the file is in JKS format. If the file is not in JKS format, you will see an error message when starting the scheduler, similar to "Failed to create an SSLSocketFactory when setting up TLS: keystore not found."

  2. Import the root CA into the keystore. The alias parameter is important. You will use this value to tell the scheduler which certificate to use when authenticating the Vertica server:

    keytool -keystore scheduler.keystore.jks -alias caroot -import -file root.crt
  3. Export the scheduler's key so you can sign it with the root CA:

    keytool -keystore scheduler.keystore.jks -alias vsched -certreq \
            -file scheduler.unsigned.cert
  4. Sign the scheduler key with the root CA:

    openssl x509 -req -CA root.crt -CAkey root.key -in scheduler.unsigned.cert \
            -out scheduler.signed.cert -days 365 -CAcreateserial
  5. Re-import the scheduler key into the keystore:

    keytool -keystore scheduler.keystore.jks -alias localhost -import -file scheduler.signed.cert

You also need a truststore for your scheduler. If you use the same CA to sign all of your keystores (as this example does) then you can just re-use the truststore you created for the Kafka cluster. If you did not use the same CA, create a truststore using the same command you used to create the Kafka cluster's truststore. In this command, add the CA you used to sign the keystore of the Kafka cluster and Vertica cluster.

Step 2: Set the VKCONFIG_JVM_OPTS Environment Variable

You must pass several settings to the JDBC interface of the Java Virtual Machine (JVM) that runs the scheduler. These settings tell the JDBC driver where to find the keystore and truststore, as well as the key's password. The easiest way to pass in these settings is to set a Linux environment variable named VKCONFIG_JVM_OPTS. As it starts, the scheduler checks this environment variable and passes any properties defined in it to the JVM.

The properties that you need to set are:

  • javax.net.ssl.keystore: the absolute path to the keystore file to use.
  • javax.net.ssl.keyStorePassword: the password for the scheduler's key.
  • javax.net.ssl.trustStore: The absolute path to the truststore file.

The Linux command line to set the environment variable is:

export VKCONFIG_JVM_OPTS="$VKCONFIG_JVM_OPTS -Djavax.net.ssl.trustStore=/path/to/truststore \
                          -Djavax.net.ssl.keyStore=/path/to/keystore \
                          -Djavax.net.ssl.keyStorePassword=keystore_password"

The previous command preserves any existing contents of the VKCONFIG_JVM_OPTS variable. If you find the variable has duplicate settings, remove the $VKCONFIG_JVM_OPTS from your statement so you override the existing values in the variable.

For example, suppose the scheduler's truststore and keystore are located in the directory /home/dbadmin/SSL. Then you could use the following command to set the VKCONFIG_JVM_OPTS variable:

$ export VKCONFIG_JVM_OPTS="$VKCONFIG_JVM_OPTS \
                           -Djavax.net.ssl.trustStore=/home/dbadmin/SSL/scheduler.truststore.jks \
                           -Djavax.net.ssl.keyStore=/home/dbadmin/SSL/dbadmin.keystore.jks \
                           -Djavax.net.ssl.keyStorePassword=key_password"

The Java property names are case sensitive.

To ensure that this variable is always set, add the command to the ~/.bashrc or other startup file of the user account that runs the scheduler.

You must also tell the scheduler to enable the SSL connection to Vertica by adding "ssl=true" to the JDBC URL the scheduler uses. The easiest way to add this string is to use the scheduler's --jdbc-opt option. Assuming you use a configuration file for your scheduler, you can add this line to it:

jdbc-opt=ssl=true

Step 3: Enable SSL in the Scheduler's Configuration

The last step to configure the scheduler is to change its configuration to enable SSL. Every time you run the vkconfig command, you must pass it the following options:

  • --enable-ssl: set to true to enable the scheduler to use SSL when connecting to Kafka.
  • --ssl-ca-alias: the alias for the CA you used to sign your Kafka broker's keys. This must match the value you supplied to the -alias argument of the keytool command to import the CA into the truststore.
  • --ssl-key-alias: the alias you assigned to the scheduler's key. This value must match the value you supplied to the -alias you supplied to the keytool command when creating the scheduler's keystore.
  • --ssl-key-password: the password for the scheduler's key.

See Common vkconfig Script Options for details of these options. For convenience and security, add these options to a configuration file that you pass to vkconfig. Otherwise, you run the risk of exposing the key password via the process list which can be viewed by other users on the same system. See Configuration File Format for more information on setting up a configuration file.

The following example shows the lines you could add to a scheduler configuration file to enable the use of the keystore and truststore created earlier (plus the option to enable SSL for the connection to Vertica):

enable-ssl=true
ssl-ca-alias=caroot
ssl-key-alias=vsched
ssl-key-password=vertica
jdbc-opt=ssl=true

Step 4: Start the Scheduler

Once you have configured the scheduler to use SSL, start it and verify that it is load data. For example, to start the scheduler with a configuration file named weblog.conf, use the command:

$ vkconfig launch --conf weblog.conf &