Vertica Integration with RapidMiner Studio: Connection Guide

About Vertica Connection Guides

Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.

Vertica and RapidMiner Studio: Latest Versions Tested

Software Version
Partner Product

RapidMiner Studio V9.10.011

Partner Product Platform

Windows Server 2019

Vertica Client

Vertica JDBC 12.0.1

Vertica Server

Vertica Analytic Database 12.0.1

Vertica Server Platform Linux RHEL 8.7

RapidMiner Studio Overview

RapidMiner is a flow-based programming tool used to streamline data science tasks from data analytics to developing predictive models. It offers pre-built templates such as churn prediction, market basket analysis, fraud detection, and others.

Installing RapidMiner Studio

To install RapidMiner Studio

  1. Go to https://rapidminer.com/get-started/
  2. Click Download.
  3. Follow the installation instructions at Installing RapidMiner Studio.

Installing the Vertica Client Driver

To install RapidMiner Studio,

Before connecting Vertica to RapidMiner Studio, you must install the Vertica JDBC driver. Follow these steps:

  1. Navigate to the Client Drivers page on the Vertica website.
  2. Download the JDBC driver package.

    Note  

    For details about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

  3. Follow the installation instructions.

Connecting RapidMiner Studio to Vertica

  1. Open RapidMiner Studio.
  2. Click Connections > Create Connection.

  3. In the Create a new connection window, provide the following details:
    1. Connection Type: Select Database.
    2. Repository: Select Local Repository.
    3. Connection Name: Type a connection name.
  4. Click Create.

  5. In the Edit connection window, click the Setup tab and provide the following details:
    1. Database system: Select Custom (configure in Driver tab)
    2. Provide the database details.

  6. Click the Driver tab and provide the following details:
    1. JDBC driver Jar file: Browse to the JDBC driver JAR file and click Open.

    2. After you select the JAR file, the screen should look as follows:

  7. Click Test connection to verify the connection. After a successful connection, click Save to save it in the local repository.

Sample Workflow for Vertica

  1. In the Operators window, expand Database.

    • Read Database operator is for reading data from the specified SQL database.

    • Write Database is for writing to a specific SQL database.

  2. Drag and drop Read Database and Write Database onto the Process pane.

  3. Click Read Database.
  4. In the Repository Browser, select the Vertica source connection.
  5. In the Parameters window on the right pane
    1. define connection: Select repository.
    2. connection entry: Select the connection you created from the local repository list.
    3. schema name: Select the schema name.
    4. table name: Select the table name.

  6. Click Write Database and repeat step 3 to select the target database from the repository.

  7. After you have selected the tables for Read and Write databases, connect Read Database and Write Database.

  8. Click to execute the process.
  9. The check mark on both the operators indicates successful completion of the process.

  10. Verify the target table via VSQL as follows:

Known Limitations

Preview data limitations

  • For TIMETZ, TIMESTAMP, and TIMESTAMPTZ data types, milliseconds are truncated and incorrect value is displayed for time zone.

  • For TIME data type, milliseconds are rounded off after 3 digits.

  • For DATE data type, date is incorrect.

  • For INTEGER data type, values are rounded off after 15 digits.

  • For NUMERIC and FLOAT data type, values are rounded off after a scale of 3 digits and and infinity is displayed if the precision is beyond 310 digits.

Write data limitations

  • LONGVARBINARY data type is not loaded.

  • For TIMETZ, TIMESTAMP, and TIMESTAMPTZ data types, milliseconds are truncated.

  • For TIME data type, milliseconds are rounded off after 3 digits.

  • For INTEGER and DECIMAL data type, values are loaded as either exponential or rounded off. Infinity is displayed if the precision is beyond 310 digits.

For More Information