Vertica Integration with Talend Open Studio for Data Integration: Connection Guide
About Vertica Connection Guides
Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.
Vertica and Talend Open Studio for Data Integration: Latest Versions Tested
Software | Version |
---|---|
Partner Product |
Talend Open Studio for Data Integration 8.0.1 |
Partner Product Platform | Windows Server 2019 Standard |
Vertica Client |
Vertica JDBC Driver 9.3.1 (bundled with the product) |
Vertica Server | Vertica Analytic Database 23.4.0 |
Talend Open Studio for Data Integration Overview
Talend Open Studio for Data Integration is an ETL tool that allows you to extract, transform, and load your data into many other databases and file formats. This tool provides features such as cloud, big data, enterprise application integration, data quality, and master data management. It also provides a unified repository to store and reuse Metadata.
Installing Talend Open Studio for Data Integration
- Navigate to Talend Open Studio for Data Integration website and click Download now.
- Fill the form and click Download Now.
- Follow the installation instructions in the Talend documentation.
Note Talend Open Studio for Data Integration uses JDBC to connect to your Vertica database. Talend installs the bundled Vertica 9.3.1 JDBC driver. You do not need to install the driver separately.
Connecting Talend Open Studio for Data Integration to Vertica
The following topics describe how to create a project that connects to Vertica, and create a job to extract, transform, and load data.
Creating a Project
A project in Talend is a top-level structure that contains objects, job designs, and metadata. Create a separate Talend project for each business purpose.
Follow these steps to create a project:
- Launch Talend Open Studio for Data Integration.
-
In the Talend Open Studio Data Integration dialog box, select Create a new project, enter a name for your project, and then click Create.
-
In the same dialog box, click Select an existing project and select the project you just created.
-
Click Finish.
The Talend Open Studio for Data Integration window opens.
Creating the Database Connections
- In the Repository tab, select Metadata > Db Connections > Create connection.
-
The Database Connection window appears.
- Type the connection name, purpose, and description and click Next.
- In the DB Type drop-down list, select Vertica and in the DB Version drop-down list, select VERTICA 9.X.
-
Enter the connection details:
- String of Connection: JDBC URL to the Database.
- Login: Database user name.
- Password: Database password.
- Server: IP address of Vertica host.
- Port: Database port. Default is 5433.
- Database: Database name.
- Schema: Database schema name.
- Additional parameters(Optional): Additional connection properties.
-
Click Test Connection to confirm that the connection is valid.
-
After a successful connection, click Finish.
The data source is now created.
-
To define a target for the extracted and transformed data, repeat steps 1 to 7 for the target connection in a different database.
- After you define the source and target connections, they appear in Repository > Metadata > Db Connections with the names you designated.
-
To select the data from your source database that you want to transform, right-click the Connection Name and select Retrieve Schema.
-
The schema and tables are displayed.
- Select the tables to extract from your source Vertica database and click Finish.
- To select the target table to load data, repeat steps 10 to 12 with the target database connection.
Creating a Job
A job contains details that Talend Open Studio for Data Integration needs to execute the ETL workflow.
To create a job
-
In the Repository tab, right-click Job Designs and select Create job.
The New Job window appears.
- Enter the name of the job and click Finish.
-
In Repository > Metadata > Db Connections > Source Connection > Table schemas, select the required table.
-
To add the table to the Job Designer, drag the table onto the designer.
-
The Components window lists the types of Vertica specific components you can use in the Job Designer.
- Select tDBInput(Vertica) and click OK to add the tDBInput(Vertica) component to the Job Designer.
For more details on each component, see Talend Components.
-
Similarly for the target, in Repository > Metadata > Db Connections > Target Connection > Table schemas, select the required tables and drag it onto the Job Designer.
-
In the Components window, select tDBOutput(Vertica) and click OK.
The tDBOutput(Vertica) component appears in the Job Designer.
- To map the two components, right-click the source component, select Row > Main and click the Target component to connect the source and target components.
Running the Job
To run the ETL job, click Run or press F6.
After the job completes, the following message is displayed:
Known Limitations
-
For TIME and TIMESTAMP data types, milliseconds are rounded off after the third place.
-
For TIMETZ data type, milliseconds are truncated.
-
For TIMESTAMPTZ data type, timezone offset is not displayed and milliseconds are truncated after the third place.
-
For BINARY, VARBINARY, and LONGVARBINARY data types, the error “Driver not capable” is displayed.