|Share this article:|
Vertica Integration with Talend: Connection Guide
Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.
Vertica and Talend: Latest Versions Tested
Talend Open Studio for Data Integration 7.2.1 ( Free Version)
|Partner Product Platform||Windows Server 2016|
Vertica JDBC Driver 9.0.x
|Vertica Server||Vertica Server 9.3.1-0|
Note We have also tested the latest version of Talend Open Studio for Data Integration 7.3.1 which is bundled with Vertica JDBC Driver 9.3.1 and the integration works fine.
Talend is an ETL tool that allows you to extract, transform, and load your data into many other databases and file formats. This tool provides features such as cloud, big data, enterprise application integration, data quality, and master data management. It also provides a unified repository to store and reuse Metadata. It is available in both open source and premium versions.
To install the latest version of Talend Open Studio for Data Integration:
Note In this integration, we tested the free version of Talend Open Studio for Data Integration.
- Navigate to https://www.talend.com/products/data-integration/data-integration-open-studio/
Click Download for the required operating system.
- Follow the installation instructions in the Talend documentation.
When you install Talend Open Studio for Data Integration, the installer extracts the following application files:
- To start Talend Open Studio for Data Integration, run the application appropriate to your Windows operating system.
Note Talend Open Studio for Data Integration uses JDBC to connect to your Vertica database. The Talend Open Studio for Data Integration installs the supported Vertica JDBC driver. You need not install the driver separately.
The following topics describe how to create a project that connects to Vertica, and create a job to extract, transform, and load data.
A project in Talend is a top-level structure that contains objects, job designs, and metadata. Create a separate Talend project for each business purpose.
Follow these steps to create a project:
- Launch Talend Open Studio for Data Integration.
In the Talend Open Studio Data Integration dialog box, select Create a new project and enter a name for your project.
In the same dialog box, click Select an existing project and select the project you just created.
The Talend Open Studio for Data Integration window opens.
- In the Repository tab, expand Metadata.
Right-click Db Connections and select Create connection.
The first Database Connection window appears.
- Enter the database details from which you want to extract data. In this example, the source database name is DB_SRC.
The second Database Connection window appears.
- In the DB Type drop-down list, select Vertica and in the DB Version drop-down list, select VERTICA 9.0.
Enter credentials for the Vertica database you want to extract the data from, including the connection information.
Click Test Connectionto confirm that the connection is valid.
After a successful connection, click Finish.
The data source is now created.
To define a target for the extracted and transformed data, repeat steps 1 to 8 for the target database.
- After you define the source and target connections, they appear in Repository > Metadata > Db Connections with the names you designated (in this case, DB_SRC and DB_TGT). Talend Open Studio for Data Integration appends "0.1" to the names.
To select the data from your source database that you want to transform, right-click DB_SRC and select Retrieve Schema.
The schema and tables are displayed.
- Select the tables to extract from your source Vertica database. In this example, we selected Char4k_Table.
A job contains details that Talend Open Studio for Data Integration needs to execute the ETL workflow.
To create a job
In the Repository tab, right-click Job Designs and select Create job.
The New Job window appears.
- Enter the name of the job.
- Click Finish.
In Repository > Metadata > Db Connections > DB_SRC 0.1 > Table schemas, select the required table (in this case, Char4k_Table).
To add the table to the Job Designer, drag the table onto the designer.
The Job Designer provides a high-level view of the workflow components and connections.
- Select the table to extract data from, in this case, Char4k_Table.
The Components window lists the types of Vertica specific components you can use in the Job Designer.
- Select tDBInput(Vertica) and click OK to add the tDBInput(Vertica) component to the Job Designer.
For more details on each component, see Talend Components.
To view the component properties, double-click the component in the Job Designer.
The properties include source database connection information and the SQL query that Talend Open Studio for Data Integration will use to extract the data.
Similarly for the target, in the Repository tab, select the target table (Char_4kTable) from Target DB (DB_TGT) and drag it into the Job Designer.
In the Components window, select tDBOutput(Vertica) and click OK.
The tDBOutput(Vertica) component appears in the Job Designer.
- To map the two components, right-click tDBInput(Vertica) and select Row > Main.
To view the tDBOutput(Vertica) schema, double-click tDBOutput(Vertica) in the Job Designer.
- In the tDBOutput(Vertica) properties window, to the right of Edit schema, click the three dots.
In the window that opens, select View schema.
The schemas for the input and output components display.
In this example, the input and output schemas are identical:
To run the ETL job, click Run or press F6.
After the job completes, the following message is displayed:
- Talend Open Studio for Data Integration only supports Vertica JDBC Driver 9.0.x.
Insert and Copy
- For TIME data types, milliseconds are rounded off.
- Timezone values are not supported.