|Share this article:|
Vertica Integration with StreamSets Data Collector: Connection Guide
About Vertica Connection Guides
Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.
Vertica and StreamSets: Latest Versions Tested
StreamSets Data Collector 3.7.2
|Partner Product Platform||
Vertica 9.2.0-0 JDBC Client
|Vertica Server||Vertica 9.2.0-0 Server (on premises)|
Streamsets is an open-source, desktop platform for hybrid cloud that enables data movement architectures at scale.
StreamSets Data Collector is a lightweight, powerful, design and execution engine that streams data in real time. Use Data Collector to route and process data in your data streams.
For details, see the StreamSets website.
- On the StreamSets website, click Download Data Collector.
- Follow the instructions in the StreamSets documentation to install the RPMs.
Install the Vertica Client Driver
Before you can connect StreamSets Data Collector to Vertica, you must install the Vertica JDBC driver. Follow these steps:
- Navigate to the Client Drivers page on the Vertica website.
Download the JDBC driver package.
For details about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.
Follow the instructions in Installing the JDBC Driver on Linux in the Vertica documentation.
Place the vertica-jdbc-9.2.0-0.jar file in the following directory:
Connect StreamSets Data Collector to Vertica
Start Data Collector as a service. For example, on CentOS 7:
systemctl start sdc
In a web browser, o pen StreamSets using the following URL:
Click + Create New Pipeline to create a new Pipeline.
Add a title and description for the new pipeline and click Save.
Under Select Origin, select JDBC Query Consumer.
Provide the JDBC connection details for the Vertica source and specify a query.
From Select Destination to connect, select JDBC Producer.
Provide the JDBC connection details for the Vertica target and select a value for Default Operation, such as INSERT, UPDATE, or DELETE.
Press the Play button on the right to execute this pipeline.
Sample Summery results are shown as follows:
IntervalSecond, IntervalMonth, and UUID data types are not loaded into StreamSets Data Collector. The following error is returned:
Pipeline Status: RUNNING_ERROR: com.streamsets.pipeline.api.StageException: JDBC_37 - Unsupported type 1111 for columns” such as “Interval Year to Month”,”UUID” and “Interval Hour to Second”
- Milliseconds are rounded off to 3 places for Time, TimeStamp, and TimeStampTZ data types.
- Millisecond values are truncated for the TimeTZ data type.
For More Information