Vertica Integration with StreamSets Data Collector: Connection Guide

About Vertica Connection Guides

Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.

Vertica and StreamSets: Latest Versions Tested

Software Version
Partner Product

StreamSets Data Collector 3.7.2

Partner Product Platform

CentOS Linux

Vertica Client

Vertica 9.2.0-0 JDBC Client

Vertica Server Vertica 9.2.0-0 Server (on premises)

StreamSets Overview

Streamsets is an open-source, desktop platform for hybrid cloud that enables data movement architectures at scale.

StreamSets Data Collector is a lightweight, powerful, design and execution engine that streams data in real time. Use Data Collector to route and process data in your data streams.

For details, see the StreamSets website.

Install StreamSets

  1. On the StreamSets website, click Download Data Collector.
  2. Follow the instructions in the StreamSets documentation to install the RPMs.

Install the Vertica Client Driver

Before you can connect StreamSets Data Collector to Vertica, you must install the Vertica JDBC driver. Follow these steps:

  1. Navigate to the Client Drivers page on the Vertica website.
  2. Download the JDBC driver package.

    Note  

    For details about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

  3. Follow the instructions in Installing the JDBC Driver on Linux in the Vertica documentation.

  4. Place the vertica-jdbc-9.2.0-0.jar file in the following directory:

    /opt/streamsets-datacollector/streamsets-libs/streamsets-datacollector-jdbc-lib/lib

Connect StreamSets Data Collector to Vertica

  1. Start Data Collector as a service. For example, on CentOS 7:

    systemctl start sdc
  2. In a web browser, o pen StreamSets using the following URL:

    http://<IP_address>:18630/
  3. Click + Create New Pipeline to create a new Pipeline.

  4. Add a title and description for the new pipeline and click Save.

  5. Under Select Origin, select JDBC Query Consumer.

  6. Provide the JDBC connection details for the Vertica source and specify a query.

  7. From Select Destination to connect, select JDBC Producer.

  8. Provide the JDBC connection details for the Vertica target and select a value for Default Operation, such as INSERT, UPDATE, or DELETE.

  9. Press the Play button on the right to execute this pipeline.

    Sample Summery results are shown as follows:

Known Limitations

  • IntervalSecond, IntervalMonth, and UUID data types are not loaded into StreamSets Data Collector. The following error is returned:

    Pipeline Status: RUNNING_ERROR: com.streamsets.pipeline.api.StageException: JDBC_37 - Unsupported type 1111 for columns” such as “Interval Year to Month”,”UUID” and “Interval Hour to Second”
  • Milliseconds are rounded off to 3 places for Time, TimeStamp, and TimeStampTZ data types.
  • Millisecond values are truncated for the TimeTZ data type.

For More Information