Vertica Integration with Tungsten Replicator: Connection Guide

About Vertica Connection Guides

Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.

Vertica and Tungsten Replicator: Latest Versions Tested

Software	Version
Partner Product	Tungsten Replicator 7.0.1
Partner Product Platform	Red Hat Enterprise Linux release 8.7 (Ootpa)
Vertica Client	Vertica JDBC 12.0.2-0
Vertica Server	Vertica Analytic Database 12.0.2-0

This guide walks you through the steps to replicate data from MySQL binary logs to Vertica.

Tungsten Replicator Overview

The Tungsten Replicator is a replication engine that supports different extractor and applier modules. Data can be extracted from MySQL and Oracle, and applied to databases such as Vertica.

The core of the replication functionality lies in three major components:

Transaction History Log (THL): THL includes SQL statement or row-based data along with meta data for data to be replicated on the applier.

Extractor: The extractor component reads data from the source data server and writes that information to the Transaction History Log (THL). The extractor is also responsible for writing the data into the THL in the native or derived format either as a SQL statement or as row-based information.

Applier: Appliers within the Tungsten Replicator convert the THL information and write it to a destination data server.

Installing Tungsten Replicator

To download and install Tungsten Replicator

Download Tungsten Replicator from the Tungsten website.
Follow the steps in the Tungsten documentation for installation.

Installing the Vertica Client Driver

Before connecting Tungsten to Vertica, you must download and install the Vertica JDBC client driver by following these steps:

Navigate to the Vertica Client Drivers page on the Vertica website.
Download the JDBC driver package.

Note For details about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.
Copy the Vertica JDBC driver in the Tungsten Replicator Applier:
```
<Tungsten_Home>/tungsten-replicator/lib
```

Deploying MySQL to Vertica Replication

Replicating to Vertica Workflow

Data is extracted from the source database into Transaction History Log (THL).
While extracting data from THL, Tungsten Replicator writes the data into CSV files based on the names of the source tables. These files contain all row-based data, including the unique global transaction ID generated by the Tungsten Replicator during replication. The operation type (Insert or Delete) is also listed as part of the CSV file.
Vertica loads the CSV files into staging tables.
Tungsten Replicator then executes SQL statements to perform updates on live tables from the staging tables.

In heterogeneous systems, it is not possible to execute DDL and DML statements from source systems into target systems because the SQL dialects are different. For MySQL to Vertica data replication, you must create tables that need to be replicated in Vertica as the target tables are not automatically created. You must use the ddlscan utility to create tables for replication.

To prepare MySQL and Vertica hosts, follow the steps in Preparing Vertica Deployments.

Preparing Schemas in Vertica

Follow these steps to create the schema and tables for staging and live environment:

Create a table in MySQL using the following command:

mysql> CREATE TABLE Char1_Table(Id int primary key,
                          DataTypeSet VARCHAR (20) 'Char1', 
                          ValueDesc   VARCHAR (50),
                          Char1_Column CHAR
                          ); default charset=utf8;

Execute the ddlscan tool twice on Extractor.

Shell $> cd <Tungsten_Home>/tungsten-replicator-x.x.x-xx/tungsten-replicator/bin

The first time generates the live table definitions:

Tungsten: shell $> /<Tungsten_Home>/tungsten-replicator-x.x.x- xx/tungsten-replicator/bin/ddlscan -user <mysql_db_user_name> -url 'jdbc:mysql:thin://<mysql_host_name>:3306/<mysql_db_name>' -pass <mysql_db_password> -template ddl-mysql-vertica.vm -db <vertica_schema_name> >>ddl.sql

The second time creates the table definitions for the staging data using the staging template:

Tungsten: shell $> /<Tungsten_Home>/tungsten-replicator-x.x.x- xx/tungsten-replicator/bin/ddlscan -user <mysql_db_user_name> -url 'jdbc:mysql:thin://<mysql_host_name>:3306/<mysql_db_name>' -pass <mysql_db_password> -template ddl-mysql-vertica-staging.vm -db <vertica_schema_name> >>ddl.sql

Edit ddl.sql file and update this file according to the target database.
Copy and execute the ddl file on the Vertica database using the following command:
```
dbadmin: shell $> vsql -p 5433 -u <db_user_name> -w <db_password> -f ddl.sql
```

Configuring the Extractor and Applier

Configuring MySQL Extractor

Create the tungsten.ini file in /etc/tungsten directory on the Extractor using the following commands:

[defaults]
install-directory=/opt/continuent
user=<tungsten_user_name>
profile-script=~/.bash_profile
mysql-allow-intensive-checks=true
disable-security-controls=true

[<tungsten_service_name>]
master=<extractor_hostname>
members=<dataservice_hostname>
enable-heterogeneous-service=true
replication-port=3306
replication-user=<mysql_db_username>
replication-password=<mysql_db_password>
datasource-mysql-conf=<mysql_configuration_filename_absolute_path>

Configuring Vertica Applier

Create tungsten.ini file in /etc/tungsten directory on the Applier using the following commands:

[defaults]
user=<tungsten_user_name>
install-directory=/opt/continuent
profile-script=~/.bash_profile
skip-validation-check=HostsFileCheck
skip-validation-check=InstallerMasterSlaveCheck
disable-security-controls=true

[<tungsten_service_name>]
topology=master-slave
master=<extractor_hostname>
members=<applier_hostname>
datasource-type=vertica
replication-user=<vertica_db_username>
replication-password=<vertica_db_password>
vertica-dbname=<vertica_db_name>
batch-enabled=true
batch-load-template=vertica6
batch-load-language=js
replication-host=<vertica_cluster_hostname>
replication-port=5433
svc-applier-filters=dropstatementdata
svc-applier-block-commit-interval=30s
svc-applier-block-commit-size=25000
disable-relay-logs=true

Monitoring MySQL to Vertica Replication

Monitoring a Vertica replication scenario requires checking the status of both the Extractor, which extracts data from MySQL, and the Applier, which retrieves the remote Transaction History Log (THL) information and applies it to Vertica.

The following graphic shows the Extractor status.

[tungsten@<Extractor_Hostname> bin]$ ./trepctl status
Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : mysql-bin.000004:0000000003930459;-1
appliedLastSeqno       : 211
appliedLatency         : 0.206
autoRecoveryEnabled    : false
autoRecoveryTotal      : 0
channels               : 1
clusterName            : alpha
currentEventId         : mysql-bin.000004:0000000003930459
currentTimeMillis      : 1669820546092
dataServerHost         : <Extractor_Hostname>
extensions             :
host                   : <Extractor_Hostname>
latestEpochNumber      : 0
masterConnectUri       : thl://localhost:/
masterListenUri        : thl://<Extractor_Hostname>:2112/
maximumStoredSeqNo     : 211
minimumStoredSeqNo     : 0
offlineRequests        : NONE
pendingError           : NONE
pendingErrorCode       : NONE
pendingErrorEventId    : NONE
pendingErrorSeqno      : -1
pendingExceptionMessage: NONE
pipelineSource         : jdbc:mysql:thin://<Extractor_Hostname>:3306/tungsten_alpha?noPrepStmtCache=true&allowMultiQueries=true&stripQueryComments=false&enabledProtocols=TLSv1.3,TLSv1.2,TLSv1.1&enabledCipherSuites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA&connectTimeout=15
relativeLatency        : 180779.092
resourceJdbcDriver     : org.drizzle.jdbc.DrizzleDriver
resourceJdbcUrl        : jdbc:mysql:thin://<Extractor_Hostname>:3306/${DBNAME}?jdbcCompliantTruncation=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&allowMultiQueries=true&yearIsDateType=false
resourcePrecedence     : 99
resourceVendor         : mysql
rmiPort                : 10000
role                   : master
seqnoType              : java.lang.Long
serviceName            : alpha
serviceType            : local
simpleServiceName      : alpha
siteName               : default
sourceId               : <Extractor_Hostname>
state                  : ONLINE
timeInStateSeconds     : 1308832.816
timezone               : GMT
transitioningTo        :
uptimeSeconds          : 1308833.433
useSSLConnection       : false
version                : Tungsten Replicator 7.0.1 build 1
Finished status command...

The following graphic shows the Applier status.

[tungsten@<Applier_Hostname> bin]$ ./trepctl status
Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : mysql-bin.000004:0000000002830087;481
appliedLastSeqno       : 210
appliedLatency         : 843.32
autoRecoveryEnabled    : false
autoRecoveryTotal      : 0
channels               : 1
clusterName            : alpha
currentEventId         : NONE
currentTimeMillis      : 1669821181090
dataServerHost         : <Vertica_Hostname>
extensions             :
host                   : <Vertica_Hostname>
latestEpochNumber      : 0
masterConnectUri       : thl://<Extractor_Hostname>:2112/
masterListenUri        : null
maximumStoredSeqNo     : 211
minimumStoredSeqNo     : 0
offlineRequests        : NONE
pendingError           : NONE
pendingErrorCode       : NONE
pendingErrorEventId    : NONE
pendingErrorSeqno      : -1
pendingExceptionMessage: NONE
pipelineSource         : thl://<Extractor_Hostname>:2112/
relativeLatency        : 182257.09
resourceJdbcDriver     : com.vertica.Driver
resourceJdbcUrl        : jdbc:vertica://<Vertica_Hostname>:5433/vertica_db
resourcePrecedence     : 99
resourceVendor         : vertica
rmiPort                : 10000
role                   : slave
seqnoType              : java.lang.Long
serviceName            : alpha
serviceType            : local
simpleServiceName      : alpha
siteName               : default
sourceId               : <Vertica_Hostname>
state                  : ONLINE
timeInStateSeconds     : 1307420.992
timezone               : GMT
transitioningTo        :
uptimeSeconds          : 1307423.537
useSSLConnection       : false
version                : Tungsten Replicator 7.0.1 build 1
Finished status command...

Testing MySQL to Vertica Replication

To replicate data, you need a table on MySQL to hold some data.

Follow these steps to test MySQL with Vertica using the Tungsten Replicator. The following example shows how to move a row from one table to another:

mysql -utungsten DBNAME
...
mysql> INSERT INTO Char1_Table VALUES(1,default, 'Empty', '');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO Char1_Table VALUES(2,default, 'Typical', 'a');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO Char1_Table VALUES(3,default, 'Max', 'Z');
Query OK, 1 row affected (0.00 sec)

If you configured things properly, you should see the following on the Vertica side:

vertica_db=> select * from tungsten_alpha.Char1_Table;
 ID | DataTypeSet | ValueDesc | Char1_Column
----+-------------+-----------+--------------
  2 | Char1       | Null      |
  3 | Char1       | Typical   | a
  1 | Char1       | Empty     |
  4 | Char1       | Max       | Z

Known Limitations

For DECIMAL data type, a precision of up to 65 digits is displayed.
For TIME and TIMESTAMP data types, milliseconds are truncated during replication.
TIMETZ and TIMESTAMPTZ data types are not displayed.
INTERVALSECOND, INTERVALMONTH, and UUID data types are not displayed.
BINARY related data types are not displayed.