Share this article:

Vertica Integration with Pentaho Data Integration: Connection Guide

About Vertica Connection Guides

Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.

Vertica and Pentaho Data Integration: Latest Versions Tested

Software Version
Partner Product

Pentaho Data Integration Community Edition v.8.1

Partner Product Component

Spoon

Desktop Platform

Microsoft Windows Server 2012

Vertica Client Vertica JDBC 9.1.1
Vertica Server Vertica Server 9.1.1

Pentaho Data Integration Overview

Pentaho is a platform that offers tools for data movement and transformation, as well as discovery and ad hoc reporting with the Pentaho Data Integration (PDI) and Pentaho Business Analytics products. This guide focuses on the Data Integration component of the platform, which provides Extraction, Transformation, and Loading (ETL) capabilities.

For information about Pentaho Business Analytics, see Vertica Integration with Pentaho Business Analytics: Connection Guide.

Pentaho offers a community edition as well as an enterprise edition of their products. You can use the community edition free of charge and upgrade to the enterprise edition if required. You can download the community edition of Pentaho Data Integration (PDI) from the Pentaho Community site.

For details about Pentaho Data Integration, see the Hitachi Vantara Pentaho Data Integration website.

Install and Configure a JRE

Before you can connect PDI to Vertica, you must install and configure a Java Runtime Environment (JRE) that is compatible with your version of Pentaho. For information, see Pentaho software requirements for JRE.

First check your current version of Java from the Windows command prompt:

java -version

If you do not have a compatible JRE installed in your system, then follow these steps:

  1. Download Oracle Java 8 from the Oracle site.
  2. Install Oracle Java 8.
  3. Set the PENTAHO_JAVA_HOME environment variable.
  4. Verify that the PENTAHO_JAVA_HOME environment variable is set correctly using the ECHO command at the Windows command prompt:

    echo %PENTAHO_JAVA_HOME%
    C:\Program Files\Java\jre1.8.0_211

Download and Install Pentaho Data Integration

You can download PDI Community Edition directly from SourceForge, or you can download it from the Pentaho Community site as follows:

  1. Navigate to the Pentaho Community site.
  2. In the navigation window on the right, select Pentaho Community and click Downloads.
  3. Scroll down to the Downloads section of Pentaho Community Edition.
  4. In the Data Integration section, click the All OS button.
  5. Fill out the form and click Submit.
  6. Save the downloaded .zip file on your computer.
  7. Unzip the .zip file inside a location of your choice. For example: C:\PentahoCE\pdi-ce\

Download and Install the Vertica Client Driver

PDI connects to Vertica using the Vertica JDBC driver. This document is based on our testing with the Vertica 9.1.1 JDBC driver. Follow these instructions to download and install the JDBC driver:

  1. Navigate to the Client Drivers page on the Vertica website.
  2. Download the JDBC driver.

    Note  

    For details about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

  3. Locate the directory where your PDI Community Edition is installed.

    In this example: C:\PentahoCE\pdi-ce

  4. Copy the Vertica JAR file to the folder \data-integration\lib in your PDI installation.

    In this example: C:\PentahoCE\pdi-ce\data-integration\lib

Note  

PDI supports both JDBC and ODBC connectivity. However, Vertica recommends using JDBC.

Connect Pentaho Data Integration to Vertica

Follow the steps below to create a connection to Vertica that can be used either as a source or as a target database:

  1. Navigate to the path of your PDI installation.

    In this example: C:\PentahoCE\pdi-ce\data-integration

  2. Double-click Spoon.bat to launch the PDI client.
  3. On the left side of the screen, in the View tab, double-click Transformations to create a new Transformation or double-click Jobs to create a new Job.

    If you are new to PDI and want to understand the difference between transformations and jobs, see the PDI documentation.

  4. Click the View tab, then right-click Database connections > New to create a connection to your Vertica database.

  5. In the Database Connection dialog box, use the General option in the left pane, which is selected by default, to enter your database connection information:

    • Connection Name: Type a name for the connection.
    • Connection Type: Select Vertica 5+.
    • Access: Select Native JDBC.
    • Host Name: Type the IP Address of the Vertica server.
    • Database Name: Type the database name.
    • Port Number: Type the port number of the database.
    • User Name: Type the database user name.
    • Password: Type the database password.

    For information about the other items in the left pane of the Database Connection dialog box, see Customize Your Database Connection.

  6. Click Test to check if the connection is correct.

  7. Click OK to close the connection test dialog box.

  8. Click OK to save the connection.

Customize Your Database Connection

In addition to the General option in the left pane of the Database Connection dialog box, you can use the other items to further customize your connection to Vertica:

  • Advanced: Use this option to customize how PDI generates SQL.
  • Options: Use this option to set up additional Vertica JDBC connection properties, such as Connection Load Balancing or a Session Label. For a full list of Vertica’s JDBC connection properties, see JDBC Connection Properties in the Vertica documentation.

  • Pooling: Use this option to enable and configure a pool of connections. Refer to the following topics in the Vertica documentation for details:

  • Clustering: Use this option to create and connect to data partitions in your database.

Using Vertica as a Source or as a Target Database

Create a transformation with steps to extract, transform, and write data. The following is an example of reading data from and writing data into your Vertica database.

Reading Data from Vertica

You can read data from your Vertica database using the Table input component. Follow these steps:

  1. Go to File > New > Transformation to create a transformation.
  2. Select the Design tab.
  3. From the Input folder, select the Table input component and drag it to the canvas.

  4. Double-click the Table input component on the canvas.

    The Table input dialog box opens.

  5. In the Table input dialog box, provide the following information:

    • Step name: Type a name for this transformation step.
    • Connection: Select a connection to Vertica from the drop-down list or create a new one. For instructions on creating a connection to Vertica, see Connect Pentaho Data Integration to Vertica.
    • SQL: Provide a query by either clicking Get SQL select statement or by typing a query directly in the text box.

  6. Optionally click Preview to view the data returned by the query.

  7. Click OK to close the dialog box.

Writing Data into Vertica

You can write data into your Vertica database using one of two components:

Table Output Component

  1. In the transformation you created earlier, select the Design tab.
  2. From the Output folder, select the Table output component and drag it to the canvas.

  3. Connect the source component to the target component by creating a Hop.

    To create the Hop, select the source component and press Shift. Without releasing the mouse button, draw a line to connect the source component to the target component, then release the mouse button.

  4. Double-click the Table output component to configure the target table.

    The Table output dialog box opens.

  5. In the Table output dialog box, select or provide the following information:

    • Step name: Type a name for this transformation step.
    • Cnnection: Select a connection to Vertica from the drop-down list or create a new one. For instructions on creating a connection to Vertica, see Connect Pentaho Data Integration to Vertica.
    • Target schema: Browse and select the schema that contains the target table.
    • Target table: Browse and select the target table.
    • Commit size: The number of rows that will be loaded, after which the transaction will be committed in the database.
    • Truncate table: Check this box if you want PDI to truncate the target table before loading.
    • Specify database fields: Check this box if the source and target tables have a different structure. If you check this box, you must manually specify the mapping of source to target columns in the Database fields tab.

  6. Click OK to close the dialog box.

  7. Press the Play button to execute the transformation and load the data from the source to the target.

Vertica Bulk Loader Component

  1. From the Bulk loading folder in the Design tab of the open transformation, select Vertica Bulk Loader.

  2. Connect the source component to the target component by creating a Hop.

    To create the Hop, select the source component and press Shift. Without releasing the mouse, draw a line to connect the source component to the target component, then release the mouse.

  3. Double-click the Vertica Bulk Load component to configure the load to the target table.

    Select or provide the following information:

    • Step name: Type a name for the transformation step.
    • Connection: Select a connection to Vertica from the drop-down list or create a new one. For instructions on creating a connection to Vertica, see Connect Pentaho Data Integration to Vertica.
    • Target schema: Browse and select the schema that contains the target table.
    • Target table: Browse and select the target table.
    • Specify database fields: Check this box if the source and target tables have a different structure. If you check this box, you must manually specify the mapping of source to target columns in the Database fields tab.

  4. Click OK to close the dialog box.

  5. Press the Play button to execute the transformation and load the data from the source to the target.

Known Limitations

Table Input

  • For Numeric data:

    • If the scale is less than 16, then precision is supported up to 125.
    • If the scale is greater than or equal to 16, then precision is supported up to 1024.

      For example, if the definition of the column in Vertica is NUMERIC(126,15), then 99999999999999999999.9999999999 is converted to 100000000000000000000.0

  • PDI displays a maximum of 10 digits of scale when reading float data types from Vertica.
  • Values of Time and TimeTz data types are automatically converted to a timestamp with a date appended to the supplied time. For example: The time value 14:30:45.666666 in Vertica is displayed as 1970/01/01 14:30:45.667 in PDI.
  • PDI does not display milliseconds when reading values of TimeTz data type from Vertica.
  • Binary, Varbinary, and Longvarbinary are not supported. The wrong data is displayed in PDI.

Table Output

  • For Numeric data:

    • If the scale is less than 16, then precision is supported up to 125.
    • If the scale is greater than or equal to 16, then precision is supported up to 1024.

      For example if the definition of the table in Vertica is NUMERIC(126,15), then 99999999999999999999.9999999999 is converted to 100000000000000000000.0

  • When writing into Vertica, values for Time and TimeTz data types are not automatically converted. The following error is displayed if PDI is unable to convert the value when writing into Vertica:

    [Vertica][JDBC](10120)Error converting data, invalid type for parameter: <#>. 

    As a workaround, you can manually convert the values to a VARCHAR before writing them into Vertica.

Vertica Bulk Loader

  • PDI displays the following error when loading values into columns of types LONGVARCHAR, INTERVAL, LONGVARBINARY and UUID in Vertica:

     java.lang.IllegalArgumentException: Column type <ColumnName> <Vertica’s datatype> not supported.
  • PDI's lowest supported TimestampTZ value is 1600-01-01. Lower values are changed. For example, 1500-02-01 00:00:00-05 is changed to 1500-02-10 00:00:00-05.
  • Numeric data is supported up to 22 digits to the left of the decimal point. Higher values return this error:

     Value <Value> is out of range. 

    Numeric data is rounded off to 15 digits to the right of the decimal point.

For More Information

 

Share this article: