Share this article:

Vertica Integration with Tungsten Replicator: Connection Guide

About Vertica Connection Guides

Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.

Vertica and Tungsten Replicator: Latest Versions Tested

Software Version
Partner Product

Tungsten Replicator 5.2.1

Partner Product Platform

CentOS 5.11 (Final)

Vertica Client

Vertica JDBC 9.2.1-0

Vertica Server Vertica Database 9.2.1-0
Vertica Server Platform CentOS 7.4.1708 (Core)

This guide takes you through the steps to replicate data from MySQL to Vertica using Tungsten. It provides the replication mechanism from MySQL 5.6.24-log to Vertica 9.2.1-0 using Tungsten 5.2.1.

Tungsten Replicator Overview

The Tungsten Replicator is an open source replication engine that supports many different extractor and applier modules. Data can be extracted from MySQL and Oracle, and applied to transactional stores. Data can also be applied to NoSQL stores such as MongoDB, and data warehouse stores such as Vertica.

The core of the replication functionality lies in three major components:

  • Transaction History Log (THL): The THL is the data that Tungsten takes from the master's binary logs and transports to its servers, with the addition of some metadata.
  • Extractor: The extractor component reads data from the source data server and writes that information to the Transaction History Log (THL). The extractor is also responsible for writing the data into the THL in the native or derived format either as a SQL statement or as row-based information.
  • Applier: Appliers within the Tungsten Replicator convert the THL information and write it to a destination data server. The applier has the capability to work with a number of different target databases, such as Vertica or MySQL.

MySQL to Vertica Replication Workflow

In heterogeneous systems, it is not possible to execute DDL and DML statements from source systems into target systems because the SQL dialects are different. Tungsten uses row-based replication when performing replication from MySQL to Vertica.

Replication to Vertica workflow:

  1. Data is extracted from the source database into (Transaction History Log) THL.
  2. While extracting data from THL, Tungsten Replicator writes the data into CSV files based on the names of the source tables. These files contain all of the row-based data, including the unique global transaction ID generated by the Tungsten Replicator during replication. The operation type (Insert or Delete) is also listed as a part of the CSV file.
  3. Vertica loads the CSV files into staging tables.
  4. Tungsten executes SQL statements to perform updates on live versions of the tables. The statements use the CSV, batch-loaded information to delete old rows and insert new data into tables. The statements also perform updates as necessary to work effectively with Vertica.

Installing Tungsten

To download and install Tungsten Replicator:

  1. Download Tungsten 5.2.1.
  2. Unpack the file into the Tungsten Replicator deployment directory using the following command:

    Tungsten: shell > $>tar –xvf tungsten-replicator-x.x.x-xx.tar.gz

Installing the Vertica Client Driver

Before connecting Tungsten to Vertica, you must download and install the Vertica JDBC client driver by following these steps:

  1. Navigate to the Vertica Client Drivers page on the Vertica website.
  2. Download the JDBC driver package.

    Note  

    For details about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

  3. Follow the instructions in Installing the Client Drivers on Linux and UNIX-Like Platforms in the Vertica documentation.

  4. Copy the Vertica JDBC driver into the Tungsten Replicator deployment directory:

    <Tungsten_Home>/tungsten-replicator-x.x.x-xx/tungsten-replicator/lib

Deploying MySQL to Vertica Replication

Before you begin, you must configure the staging host that is responsible for setting up the replication services involved in the replication process. For details on configuring the staging host, see Staging Host Configuration in the Tungsten documentation.

Configuring Hosts

Tungsten Replicator with MySQL as the source uses row-based replication to replicate data to Vertica. You must create tables that need to be replicated in Vertica; tables are not automatically created. You must use the ddlscan utility to create tables for replication. To prepare MySQL and Vertica hosts, follow the steps in Preparing Vertica Deployments.

Preparing Schemas in Vertica

The Tungsten Replicator does not prepare target database schemas and tables based on the source database. The Tungsten Replicator includes a tool called the ddlscan, which reads the schema definition from MySQL and translates that information into the schema definition required for the target database. You should prepare the source database tables you want to replicate to Vertica and then prepare table definitions for your Vertica database. Follow these steps to create the schema and tables for the staging tables and target tables:

  1. Create a table in MySQL using the following command:

    mysql -utungsten DBNAME
     ...
    mysql> CREATE TABLE Char1_Table(Id int primary key,
                              DataTypeSet VARCHAR (20) 'Char1', 
                              ValueDesc   VARCHAR (50),
                              Char1_Column CHAR
                              ); default charset=utf8;
  1. Execute the ddlscan tool twice.

    Shell $> cd <Tungsten_Home>/tungsten-replicator-x.x.x-xx/tungsten-replicator/samples/extensions/velocity

    The first time generates the live table definitions:

    Tungsten: shell $> /<Tungsten_Home>/tungsten-replicator-x.x.x-xx/tungsten-replicator/bin/ddlscan -user tungsten –url jdbc:mysql:thin://host1:3306/DBNAME -pass password -template ddl-mysql-vertica.vm -db DBNAME >>ddl.sql

    The second time creates the table definitions for the staging data using the staging template:

    Tungsten: shell $> /<Tungsten_Home>/tungsten-replicator-x.x.x-xx/tungsten-replicator/bin/ddlscan -user tungsten –url jdbc:mysql:thin://host1:3306/DBNAME -pass password -template ddl-mysql-vertica-staging.vm -db DBNAME >>ddl.sql
  1. Edit ddl.SQL file and update this file according to the target database.
  2. Execute the Vertica database using the following command:

    dbadmin: shell $> vsql –U dbadmin –w <PASSWORD> <ddl.sql

Configuring Vertica as Target

  1. Change to the staging directory using the following command:

    Tungsten: shell $> cd <Tungsten_Home>/tungsten-replicator-x.x.x-xx
  1. Configure the main parameters for the replicator service:

    Tungsten: shell $> ./tools/tpm configure alpha \
        --master=host1 \
        --members=host1, host2 \
        --install-directory=/opt/continuent \
        --disable-relay-logs=true \
        --skip-validation-check=HostsFileCheck \
        --enable-heterogenous-service=true \
        --start
  1. Configure MySQL Master:

    Tungsten: shell $> ./tools/tpm update alpha\
            --master=host1 \
            --hosts=host1 \
            --datasource-host=host1 \
            --datasource-user=tungsten \
            --datasource-password=password \
            --datasource-mysql-conf=/usr/my.cnf \
            --home-directory=/opt/continuent \
            --java-file-encoding=UTF8 \
            --java-user-timezone=GMT \
            --svc-extractor-filters=colnames,pkey \
            --
    property=replicator.filter.pkey.addColumnsToDeletes=true \
            --
    property=replicator.filter.pkey.addPkeyToInserts=true \
            --mysql-use-bytes-for-string=false \
            --start-and-report

    Note The preceding command has some essential settings that help with heterogeneous replication:

    • The Java VM file encoding and time zone are UTF-8 and GMT, respectively. Standardizing these values is required to avoid corrupting data in batch loads.
    • Tungsten translates string values to UTF-8 rather than passing these values to slaves as bytes.
    • Tungsten inserts filters to add column names and identify the primary key on tables. These additions are required for batch loading to work properly.
  1. Configure Vertica slave server:

    Tungsten: shell $> ./tools/tpm update alpha \
        --hosts=host2 \
    --replication-user=dbadmin \
    --replication-password=password \
    --batch-enabled=true \
    --batch-load-language=js \
    --batch-load-template=vertica6 \
    --datasource-type=vertica \
    --vertica-dbname=DBNAME \
    --replication-host=host2 \
    --replication-port=5433 \
    --skip-validation-check=InstallerMasterSlaveCheck \
    --svc-applier-block-commit-size=25000 \
    --svc-applier-block-commit-interval=30s \
    --start-and-report

Monitoring MySQL to Vertica Replication

Monitoring a Vertica replication scenario requires checking the status of both the master, which extracts data from MySQL, and the slave, which retrieves the remote Transaction History Log (THL) information and applies it to Vertica.

The following graphic shows the master server. The output of the trepctl shows the current sequence number and applier status.

[root@glvmlinux-67-50 opt]# /opt/continuent/tungsten/tungsten-replicator/bin/trepctl status
Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : mysql-bin.000112:0000000000000393;-1
appliedLastSeqno       : 0
appliedLatency         : 0.542
autoRecoveryEnabled    : false
autoRecoveryTotal      : 0
channels               : 1
clusterName            : alpha
currentEventId         : mysql-bin.000112:0000000000000393
currentTimeMillis      : 1571308245984
dataServerHost         : xxx.xx.xx.xxx
extensions             :
host                   : xxx.xx.xx.xxx
latestEpochNumber      : 0
masterConnectUri       : thl://localhost:/
masterListenUri        : thl://xxx.xx.xx.xxx:2112/
maximumStoredSeqNo     : 0
minimumStoredSeqNo     : 0
offlineRequests        : NONE
pendingError           : NONE
pendingErrorCode       : NONE
pendingErrorEventId    : NONE
pendingErrorSeqno      : -1
pendingExceptionMessage: NONE
pipelineSource         : /var/lib/mysql/
relativeLatency        : 115.984
resourcePrecedence     : 99
rmiPort                : 10000
role                   : master
seqnoType              : java.lang.Long
serviceName            : alpha
serviceType            : local
simpleServiceName      : alpha
siteName               : default
sourceId               : xxx.xx.xx.xxx
state                  : ONLINE
timeInStateSeconds     : 115.804
timezone               : GMT
transitioningTo        :
uptimeSeconds          : 118.815
useSSLConnection       : false
version                : Tungsten Replicator 5.2.1
Finished status command...
		

The following graphic shows the slave server. The output of the trepctl shows the current sequence number and applier status.

[root@glvmlinux-67-49 tungsten-replicator-5.2.1]# /opt/continuent/tungsten/tungsten-replicator/bin/trepctl status
Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : NONE
appliedLastSeqno       : -1
appliedLatency         : -1.0
autoRecoveryEnabled    : false
autoRecoveryTotal      : 0
channels               : 1
clusterName            : alpha
currentEventId         : NONE
currentTimeMillis      : 1571308237621
dataServerHost         : xxx.xx.xx.xxx
extensions             :
host                   : xxx.xx.xx.xxx
latestEpochNumber      : -1
masterConnectUri       : thl://xxx.xx.xx.xxx:2112/
masterListenUri        : null
maximumStoredSeqNo     : -1
minimumStoredSeqNo     : -1
offlineRequests        : NONE
pendingError           : NONE
pendingErrorCode       : NONE
pendingErrorEventId    : NONE
pendingErrorSeqno      : -1
pendingExceptionMessage: NONE
pipelineSource         : thl://xxx.xx.xx.xxx:2112/
relativeLatency        : -1.0
resourcePrecedence     : 99
rmiPort                : 10000
role                   : slave
seqnoType              : java.lang.Long
serviceName            : alpha
serviceType            : local
simpleServiceName      : alpha
siteName               : default
sourceId               : xxx.xx.xx.xxx
state                  : ONLINE
timeInStateSeconds     : 15.875
timezone               : GMT
transitioningTo        :
uptimeSeconds          : 22.223
useSSLConnection       : false
version                : Tungsten Replicator 5.2.1
Finished status command...
		

Testing MySQL to Vertica Replication

To replicate data, you need a table on MySQL to hold some data.

Follow these steps to test MySQL with Vertica using the Tungsten Replicator. The following example shows how to move a row from one table to another:

  1. Log in to MySQL and insert a row:

    mysql -utungsten DBNAME
    ...
    mysql> INSERT INTO Char1_Table VALUES(1,default, 'Empty', '');
    Query OK, 1 row affected (0.00 sec)
    mysql> INSERT INTO Char1_Table VALUES(2,default, 'Typical', 'a');
    Query OK, 1 row affected (0.00 sec)
    mysql> INSERT INTO Char1_Table VALUES(3,default, 'Max', 'Z');
    Query OK, 1 row affected (0.00 sec)
  1. If you configured things properly, you should see the following on the Vertica side:

    dbadmin=> SELECT * from tungsten.char1_table;
    Id | DataTypeSet | ValueDesc | Char1_Column
    ----+-------------------------------------------
    1 | default     | Empty     |
    2 | default     | Typical   | a
    3 | default     | Max       | Z
    (3 rows)

Example: Use the SimpleBatchApplier

The Tungsten Replicator applies data to Vertica using a new applier class called the SimpleBatchApplier. It implements the CSV files through the following process: 

  1. As new transactions arrive, the Replicator writes them to CSV files named after the corresponding Vertica tables. For example, if you have updates for a table named simple_tab in a schema test, the format would look like the following:

    Schema  |   Table     |     Column      |  Type       | Size |
    --------+-------------+-----------------+--------------+-----  
    |tungsten| Char1_Table | id              | int        |    8 |
    |tungsten| Char1_Table | DataTypeSet     | varchar(20)|   20 |
    |tungsten| Char1_Table | ValueDesc       | varchar(40)|   40 |
    |tungsten| Char1_Table | Char1_Column    | char       |    1 |
  1. The updates go into a file named simple_tab. The following is an example of the data in the CSV file:

    "64087","I","5","Some data","Some data","B","1"
    "64087","I","6","more data","more data","C","2"
    "64088","D",3",","default","Max",”z”,"3"
  1. The CSV file includes a unique global transaction ID, an operation code (I for insert and D for delete), and the primary key. For inserts, there are additional columns that contain data; for deletes, columns contain nulls. The last column is a row number, which allows for ordering after the data is loaded into Vertica.
  2. The Tungsten Replicator applies transactions to replicas in serial order without deviations. If you INSERT and then UPDATE a row, it always works because the Replicator applies them to the slave server in the same order.
  3. The Tungsten Replicator continues writing transactions until it reaches the block commit maximum. The Replicator then closes each CSV file and loads the content into a staging table that is named according to the base name. The staging table format mimics the CSV file columns. For example, the staging table could look like the following example:

    Schema |          Table     |     Column      |     Type     |Size |
    --------+------------------+-----------------+--------------+------+
    tungsten| stage_xxx_Char1_Table| tungsten_seqno  | int        |  8 |
    tungsten| stage_xxx_Char1_Table | tungsten_opcode| char(1)    |  1 |
    tungsten| stage_xxx_Char1_Table | id             | int        |  8 |
    tungsten| stage_xxx_Char1_Table | DataTypeSet    | varchar(20)| 20 |
    tungsten| stage_xxx_Char1_Table | ValueDesc      | Varchar(40)| 40 |
    tungsten| stage_xxx_Char1_Table | Char1_Column   | Char       |  1 |
    tungsten| stage_xxx_Char1_Table | tungsten_row_id| int        |  8 |
  1. Finally, the Replicator applies the deletes and inserts to the table simple_tab by executing SQL commands as shown in the following example:

    DELETE FROM tungsten.Char1_Table WHERE id IN
        (SELECT id FROM tungsten.stage_xxx_Char1_Table
            WHERE tungsten_opcode = 'D');
    INSERT INTO tungsten.Char1_Table(id, DataTypeSet,ValueDesc,Char1_Column)
        SELECT id, DataTypeSet,ValueDesc,Char1_Column
        FROM tungsten.stage_xxx_Char1_Table AS stage_a
        WHERE tungsten_opcode='I' AND tungsten_row_id IN
            (SELECT MAX(tungsten_row_id)
              FROM tungsten.stage_xxx_Char1_Table GROUP BY id);

Known Limitations

Tungsten currently has some important limitations for batch loading, such as

  • Binary data may cause problems when converted to CSV as it converts to Unicode.
  • Nan, Negative Infinity, and Positive Infinity of Double data type are not supported.

  • IntervalSeconds and IntervalMonth data types are not supported.

  • TimeTz and TimeStampTz data types are not supported

  • Milliseconds are truncated in Time data type.
  • Milliseconds are rounded off to 3 places after the decimal point for the TimeStamp data type.

Data Type Mapping: MySQL to Vertica

The following table shows MySQL to Vertica data type conversions. This table lists only data types that are converted to a different data type. All other MySQL data types are mapped to the same Vertica data type respectively.

MySQL Data Type Vertica Data Type

MEDIUMINT

INT

BIGINT

INT

TEXT, TINYTEXT, MEDIUMTEXT, LONGTEXT

VARCHAR(65000)

BLOB, TINYBLOB, MEDIUMBLOB, LONGBLOB

VARBINARY(65000)

DOUBLE

DOUBLE PRECISION

ENUM

VARCHAR

SET

VARCHAR(4000)

BIT(1)

BOOLEAN

BIT

CHAR(64)

For More Information

Share this article: