Share this article:

Vertica Integration with Tungsten Replicator: Connection Guide

About Vertica Connection Guides

Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.

Vertica and Tungsten: Latest Versions Tested

Software Version
Partner Product

Tungsten 5.2.1

Desktop Platform

CentOS release 5.11 (Final)

Vertica Client

Vertica JDBC 9.1.1-0

Vertica Server Vertica Database 9.1.1-0
Server Platform CentOS release 5.11 (Final)

This document demonstrates how users can replicate data from MySQL to Vertica using Tungsten. It provides the replication mechanism from MySQL 5.6.24-log to Vertica 9.1.1-0 using Tungsten 5.2.1 .

Tungsten Replicator Overview

The Tungsten Replicator is an open source replication engine that supports many different extractor and applier modules. Data can be extracted from MySQL and Oracle, and applied to transactional stores. Data can also be applied to NoSQL stores such as MongoDB, and data warehouse stores such as Vertica.

The core of the replication functionality lies in three major components:

  • Transaction History Log (THL): The THL is the data that Tungsten takes from the master's binary logs and transports to its servers, with the addition of some metadata.
  • Extractor: The extractor component reads data from the source data server and writes that information to the Transaction History Log (THL). The extractor is also responsible for writing the data into the THL in the native or derived format either as a SQL statement or as row-based information.
  • Applier: Appliers within the Tungsten Replicator convert the THL information and write it to a destination data server. The applier has the capability to work with a number of different target databases, such as Vertica or MySQL.

Replication to Vertica

In heterogeneous systems, it is not possible to execute the DDL and DML statements from source systems into target systems because the SQL dialects are different. Tungsten uses row-based replication when performing replication from MySQL to Vertica.

Replication to Vertica follows this flow:

Data is extracted from the source database into the THL.


While extracting data from the THL, the Tungsten Replicator writes the data into CSV files based on the names of the source tables. These files contain all of the row-based data, including the unique global transaction ID generated by the Tungsten Replicator during replication. The operation type (Insert or Delete) is also listed as part of the CSV file.


Vertica loads the CSV files into staging tables.


Tungsten executes SQL statements to perform updates on live versions of the tables. The statements use the CSV, batch-loaded information to delete old rows and insert new data into tables. The statements also perform updates as necessary to work effectively with Vertica.

Download and Install Tungsten

Follow these steps to download and install the Tungsten Replicator.

  1. Download the software for Tungsten 5.2.1 from Github.
  2. Unpack the file into the Tungsten Replicator deployment directory using the following command:

    Tungsten: shell > $>tar –xvf tungsten-replicator-x.x.x-xx.tar.gz

Install the Vertica Client Driver

Before you can connect to Vertica using Tungsten, you must install the Vertica JDBC driver. Follow these steps:

  1. Navigate to the Client Drivers page on the Vertica website.
  2. Download the JDBC driver package.


    For details about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

  3. Follow the installation instructions in the Vertica documentation

  4. Copy the Vertica JDBC driver into the Tungsten Replicator deployment directory:


Set up MySQL to Vertica Replication

Before you begin, you must configure the staging host that is responsible for setting up the replication services involved in the replication process. For details on configuring the staging host, see Staging Host Configuration in the Tungsten documentation.

Configure Hosts

MySQL uses row-based replication to replicate data to Vertica. You must create tables that need to be replicated in Vertica; tables are not automatically created. You must use the ddlscan utility to create tables for replication. To prepare the MySQL and Vertica hosts, follow the steps in Preparing Hosts for Vertica Deployments.

Prepare Schemas in Vertica

The Tungsten Replicator does not prepare target database schemas and tables based on the source database. The Tungsten Replicator includes a tool called the ddlscan, which reads the schema definition from MySQL and translates that information into the schema definition required for the target database. You should prepare the source database tables you want to replicate to Vertica and then prepare table definitions for your Vertica database. Follow these steps to create schema and tables for the staging tables and target tables:

  1. Create a table in MySQL using the following command:

    mysql -utungsten DBNAME
    mysql> CREATE TABLE Char1_Table(Id int primary key,
                              DataTypeSet VARCHAR (20) 'Char1', 
                              ValueDesc   VARCHAR (50),
                              Char1_Column CHAR
                              ); default charset=utf8;
  1. Execute the ddlscan tool twice.

    Shell $> cd <Tungsten_Home>/tungsten-replicator-x.x.x-xx/tungsten-replicator/samples/extensions/velocity

    The first time generates the live table definitions:

    Tungsten: shell $> /<Tungsten_Home>//tungsten-replicator-x.x.x-xx/tungsten-replicator/bin/ddlscan -user tungsten –url jdbc:mysql:thin://host1:3306/DBNAME -pass password -template ddl-mysql-vertica.vm -db DBNAME >>ddl.sql

    The second time creates the table definitions for the staging data using the staging template:

    Tungsten: shell $> /<Tungsten_Home>//tungsten-replicator-x.x.x-xx/tungsten-replicator/bin/ddlscan -user tungsten –url jdbc:mysql:thin://host1:3306/DBNAME -pass password -template ddl-mysql-vertica-staging.vm -db DBNAME >>ddl.sql
  1. Edit ddl.SQL file and update this file according to the target database.
  2. Execute the Vertica database using the following command:

    dbadmin: shell $> vsql –U dbadmin –w <PASSWORD> <ddl.sql

Install Vertica Replication

  1. Change to the staging directory using the following command:

    Tungsten: shell $> cd <Tungsten_Home>/ tungsten-replicator-x.x.x-xx
  1. Configure the main parameters for the replicator service:

    Tungsten: shell $> ./tools/tpm configure alpha \
        --master=host1 \
        --members=host1, host2 \
        --install-directory=/opt/continuent \
        --disable-relay-logs=true \
        --skip-validation-check=HostsFileCheck \
        --enable-heterogenous-service=true \
  1. Configure and install the MySQL Master:

    Tungsten: shell $> ./tools/tpm update alpha\
            --master=host1 \
            --hosts=host1 \
            --datasource-host=host1 \
            --datasource-user=tungsten \
            --datasource-password=password \
            --datasource-mysql-conf=/usr/my.cnf \
            --home-directory=/opt/continuent \
            --java-file-encoding=UTF8 \
            --java-user-timezone=GMT \
            --svc-extractor-filters=colnames,pkey \
    property=replicator.filter.pkey.addColumnsToDeletes=true \
    property=replicator.filter.pkey.addPkeyToInserts=true \
            --mysql-use-bytes-for-string=false \

    Note The preceding command has some essential settings that help with heterogeneous replication:

    • The Java VM file encoding and time zone are UTF-8 and GMT, respectively. Standardizing these values is required to avoid corrupting data in batch loads.
    • Tungsten translates string values to UTF-8 rather than passing these values to slaves as bytes.
    • Tungsten inserts filters to add column names and identify the primary key on tables. These additions are required for batch loading to work properly.
  1. Configure and install the Vertica slave server:

    Tungsten: shell $> ./tools/tpm update alpha \
        --hosts=host2 \
    --replication-user=dbadmin \
    --replication-password=password \
    --batch-enabled=true \
    --batch-load-language=js \
    --batch-load-template=vertica6 \
    --datasource-type=vertica \
    --vertica-dbname=DBNAME \
    --replication-host=host2 \
    --replication-port=5433 \
    --skip-validation-check=InstallerMasterSlaveCheck \
    --svc-applier-block-commit-size=25000 \
    --svc-applier-block-commit-interval=30s \

Monitor Vertica Deployment

Monitoring a Vertica replication scenario requires checking the status of both the master, which extracts data from MySQL, and the slave, which retrieves the remote THL information and applies it to Vertica.

The following graphic shows the master server. The output of the trepctl shows the current sequence number and applier status.


The following graphic shows the slave server. The output of the trepctl shows the current sequence number and applier status.


Test MySQL with Vertica Replication

To replicate data, you need a table on MySQL to hold some data.

Follow these steps to test MySQL with Vertica using the Tungsten Replicator. The following example shows how to move a row from one table to another:

  1. Log in to MySQL and insert a row:

    mysql -utungsten DBNAME
    mysql> INSERT INTO Char1_Table VALUES(1,default, 'Empty', '');
    Query OK, 1 row affected (0.00 sec)
    mysql> INSERT INTO Char1_Table VALUES(2,default, 'Typical', 'a');
    Query OK, 1 row affected (0.00 sec)
    mysql> INSERT INTO Char1_Table VALUES(3,default, 'Max', 'Z');
    Query OK, 1 row affected (0.00 sec)
  1. If you configured things properly, you should see the following on the Vertica side:

    dbadmin=> SELECT * from tungsten.char1_table;
    Id | DataTypeSet | ValueDesc | Char1_Column
    1 | default     | Empty     |
    2 | default     | Typical   | a
    3 | default     | Max       | Z
    (3 rows)

Example: Use the SimpleBatchApplier

The Tungsten Replicator applies data to Vertica using a new applier class called the SimpleBatchApplier.  It implements the CSV files through the following process: 

  1. As new transactions arrive, the Replicator writes them to CSV files named after the corresponding Vertica tables. For example, if you have updates for a table named simple_tab in a schema test, the format would look like the following:

    Schema  |   Table     |     Column      |  Type       | Size |
    |tungsten| Char1_Table | id              | int        |    8 |
    |tungsten| Char1_Table | DataTypeSet     | varchar(20)|   20 |
    |tungsten| Char1_Table | ValueDesc       | varchar(40)|   40 |
    |tungsten| Char1_Table | Char1_Column    | char       |    1 |
  1. The updates go into a file named simple_tab. The following is an example of the data in the CSV file:

    "64087","I","5","Some data","Some data","B","1"
    "64087","I","6","more data","more data","C","2"
  1. The CSV file includes a unique global transaction ID, an operation code (I for insert and D for delete), and the primary key. For inserts, there are additional columns that contain data; for deletes, columns contain nulls. The last column is a row number, which allows for ordering after the data is loaded into Vertica.
  2. The Tungsten Replicator applies transactions to replicas in serial order without deviations. If you INSERT and then UPDATE a row, it always works because the Replicator applies them to the slave server in the same order.
  3. The Tungsten Replicator continues writing transactions until it reaches the block commit maximum. The Replicator then closes each CSV file and loads the content into a staging table that is named according to the base name. The staging table format mimics the CSV file columns. For example, the staging table could look like the following example:

    Schema |          Table     |     Column      |     Type     |Size |
    tungsten| stage_xxx_Char1_Table| tungsten_seqno  | int        |  8 |
    tungsten| stage_xxx_Char1_Table | tungsten_opcode| char(1)    |  1 |
    tungsten| stage_xxx_Char1_Table | id             | int        |  8 |
    tungsten| stage_xxx_Char1_Table | DataTypeSet    | varchar(20)| 20 |
    tungsten| stage_xxx_Char1_Table | ValueDesc      | Varchar(40)| 40 |
    tungsten| stage_xxx_Char1_Table | Char1_Column   | Char       |  1 |
    tungsten| stage_xxx_Char1_Table | tungsten_row_id| int        |  8 |
  1. Finally, the Replicator applies the deletes and inserts to the table simple_tab by executing SQL commands as shown in the following example:

    DELETE FROM tungsten.Char1_Table WHERE id IN
        (SELECT id FROM tungsten.stage_xxx_Char1_Table
            WHERE tungsten_opcode = 'D');
    INSERT INTO tungsten.Char1_Table(id, DataTypeSet,ValueDesc,Char1_Column)
        SELECT id, DataTypeSet,ValueDesc,Char1_Column
        FROM tungsten.stage_xxx_Char1_Table AS stage_a
        WHERE tungsten_opcode='I' AND tungsten_row_id IN
            (SELECT MAX(tungsten_row_id)
              FROM tungsten.stage_xxx_Char1_Table GROUP BY id);

Known Limitations

Tungsten currently has some important limitations for batch loading, namely:

  • Primary keys must be a single column only. Tungsten does not handle multi-column keys. You must define the primary keys.
  • Binary data may cause problems when converted to CSV as it converts to Unicode.

Data Type Mapping: MySQL to Vertica

The following table shows data type mapping between MySQL data types and Vertica data types.

MySQL Data Type Vertica Data Type











































For More Information

Share this article: