Vertica Integration with BryteFlow Change Data Capture: Connection Guide

Applies to Vertica 7.2.x and earlier 

About Vertica Connection Guides

Vertica connection guides provide basic information about setting up connections to Vertica from software that our technology partners create. These documents provide guidance using one specific version of Vertica and one specific version of the third party vendor’s software. Other versions of the third-party product may work with Vertica. However, other versions may not have been tested. This document provides guidance using the latest versions of Vertica and BryteFlow Change Data Capture as of February, 2016.

BryteFlow Change Data Capture (CDC) Overview

BryteFlow CDC is a web-based application that facilitates log-based data capture from OLTP transactional databases to your Vertica analytic database. The replication process supports Slowly Changing Dimensions type 2 history and scheduling on a regular basis. BryteFlow CDC is compatible with Windows, Linux, some distributions of Unix operating systems, and a Chrome web browser. It is not compatible with other web browsers. BryteFlow CDC uses the JDBC client driver to connect to your Vertica database on premise or on demand.

This document describes the results of test Vertica 7.1.x and Vertica 7.2.x with BryteFlow CDC 5.6 on Linux and Windows operating systems. This document uses Oracle as a sample data source. As of this writing, Microsoft SQL Server has not been tested as a sample data source.

Download BryteFlow CDC

To download a free trial of BryteFlow CDC, follow these steps:

  1. Navigate to the Bryte System contact page.
  2. Enter your contact information.
  3. Click Submit.

After you receive your free trial, follow the pre-requirement and installation instructions that Bryte provides.

Vertica Client Drivers

BryteFlow CDC connects to Vertica using the 7.1.x JDBC client driver. This driver is part of the BryteFlow CDC product and is packaged as part of the application. If you are connecting to Vertica 7.1.x and later, you do not need to install the Vertica JDBC driver.

If you are using Vertica 7.0.x or earlier, you must request that BryteFlow CDC installs the corresponding Vertica JDBC driver. If you want to use Vertica 7.2.x capabilities, you must request that BryteFlow CDC installs the Vertica 7.2.x JDBC driver.

For questions about driver installation, contact Bryte support.

Transfer Data to Vertica

To transfer data from a source database into Vertica, you must configure a source database connection and a destination database connection. This document was prepared using Oracle as the source database and Vertica as the destination database.

The following steps show how to begin to transfer data from Oracle to Vertica:

  1. Navigate to the directory where you installed BryteFlow CDC. Start the application using nohup, as shown in the following examples:
  2. Linux Terminal:

    $cd /opt/bryteflow/sirus-oa
    $nohup ./bryteflowcdc&

    Windows command line:

    cd bryteflow\sirus-oa
    Start BryteFlowCDC service
  3. Open BryteFlow CDC on your Chrome web browser. The following is an example URL:

    http://<IP-address>:8000/bryteflow/wv
  4. To configure your source database and destination database, click Connections

    The following sections detail how to set up each database type.

Configure Source Database Settings

Follow these steps to configure the Oracle source database from the Connections panel:

  1. In the Connections panel on the left, select Source Database.
  2. Choose your Database Type (in this example, Oracle 11G).
  3. Enter the Archive Dir: the directory where the Oracle Archive log files reside.
  4. Enter the Archive File Ext by copying a sample archive log file name to the field and replacing the archive log number with a % as shown in the following:

    ArchiveFile.png
                                                                           

  5. Enter the Database Host: IP address or localhost of source server.
  6. Enter the Database Port: Default is 1521.
  7. Enter the Database Name. This is the name of the Oracle instance.
  8. Enter the User Id: The Oracle User Id that is used for BryteFlow CDC.
  9. Enter and confirm your password.
  10. Click Test Conn to test the connection.
  11. Click Apply to save the details.

Configure Destination Database Settings

Follow these steps to configure the destination database:

  1. In the Connections panel on the left, select Destination Database.
  2. Choose Vertica as your Database Type.
  3. Enter the Database Host: IP address or localhost of database server.
  4. Enter the Database Port: Default is 5433.
  5. Enter the Database Name.
  6. Enter the User Id: The Vertica User Id that is used for BryteFlow CDC.
  7. Enter and confirm your password.
  8. Click Test Conn to test the connection.
  9. Click Apply to save the details.

Choose Data Sources

You can choose data sources based on whether you need a full load or an incremental load. A full load replaces old records in the database with new records, and keeps only the last state of data. A full load does not track historical data. An incremental load tracks changes made to records in the database, which means you do not lose the history of the data.

Follow these steps to choose a schema or table to move into Vertica:

  1. In the Data Sources panel on the left, expand the tree.
  2. Choose the schema or tables you are interested in moving into Vertica.
  3. Select the transfer type, depending on whether you need a full load or an incremental load with history tracking.

Schedule the Extract

  1. In the Schedule Extract panel on the left, click Extract.
  2. Select the type of schedule you want for the frequency of data transfers.
  3. Click Apply.
  4. Click Full Extract.

Note The first extract must be a full extract.

Known Limitations

The following is a list of known limitations when moving data from Oracle into Vertica using BryteFlow CDC:

  • BryteFlow CDC replicates Oracle NCHAR data types into Vertica with trailing spaces if the value of the datatype is less than the designated data type length.
  • BryteFlow CDC truncates all digits to the right of the decimal for Oracle FLOAT data type values.
  • BryteFlow CDC loads Oracle TIMESTAMP (#) WITH TIME ZONE values without the time zone offset.
  • BryteFlow CDC maps Oracle BINARY_DOUBLE and BINARY_FLOAT data types into Vertica, but the values are not replicated.
  • BryteFlow CDC does not support Oracle BFILE, BLOB, CLOB, NCLOB, LONG RAW, LONG, RAW, ROWID, and UROWID data types. If you need an additional driver to support these data types, contact Bryte support.

For More Information