Vertica Integration with Pentaho Business Analytics: Connection Guide

About Vertica Connection Guides

Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.

Vertica and Pentaho Business Analytics: Latest Versions Tested

Software Version
Partner Products

Pentaho Business Analytics Community Edition 9.2

Community Dashboard Editor

Pentaho Business Analytics Enterprise Edition 9.2

Dashboard

Analysis Report

Report Designer

Desktop Platform

Microsoft Windows Server 2016 (64-bit)

Vertica Client

Vertica JDBC Driver 11.1

Vertica Server

Vertica Server 11.1

Pentaho Business Analytics Server Overview

Pentaho is a platform that offers tools for data movement and transformation, as well as discovery and ad hoc reporting with the Pentaho Data Integration and Pentaho Business Analytics products. This guide focuses on the business analytics component of the platform. For more information about ETL, see the Pentaho PDI Tips and Techniques document.

Pentaho offers an open source as well as an Enterprise Edition of their products. You can use the Community Edition free of charge and upgrade to the Enterprise Edition if required. For the Community Edition, we tested the Community Dashboard Editor. For the Enterprise Edition, we tested the Dashboard, Analysis Report, and Report Designer components.

Installing Pentaho Business Analytics (BA) Platform

Enterprise Edition

You can download the latest version of Pentaho Business Analytics Platform as follows:

  1. Navigate to https://www.hitachivantara.com/en-us/home.html.
  2. From the menu, click PRODUCTS > Download Pentaho.

  3. Scroll down and click START YOUR TRIAL!.
  4. Fill out the form and click DOWNLOAD FREE TRIAL.

  5. Save the downloaded .exe file on your computer.

  6. Double-click the installer .exe and follow the prompts for installation.

Community Edition

  1. Download the Pentaho Community Edition zip file from Source Forge: https://sourceforge.net/projects/pentaho/files/.
  2. To install, unzip the contents in a location of your choice. Then navigate to this directory and execute the following programs:

    • set-pentaho-env.bat - Sets up the necessary paths and environment variables that Pentaho needs to work properly.
    • start-pentaho.bat - Initiates the Pentaho BA server. This script starts Apache Tomcat on port 8080.

Installing the Vertica Client Driver

Pentaho BA connects to Vertica using the Vertica JDBC driver. The Vertica JDBC jar file should be placed in different locations depending on the Community or Enterprise Edition components you are using. Follow these steps:

  1. Navigate to the Client Drivers page on the Vertica website.
  2. Download the JDBC driver package.

    Note For details about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

  3. Copy the JAR file you downloaded.
  4. Locate the directory where Pentaho is installed.
  5. Paste the Vertica JAR file in the specified location:
    ComponentJAR File Location
    Community Edition
    Community Dashboard Editor (CDE)C:\<Pentaho>\tomcat\lib.
    Enterprise Edition
    Analysis Report and DashboardC:\<Pentaho>\server\pentaho-server\tomcat\lib
    Report DesignerC:\<Pentaho>\design-tools\report-designer\lib
  6. Restart the Pentaho BA server.

Connecting Pentaho Business Analytics to Vertica

For Community and Enterprise Editions

Follow these steps to create a connection to Vertica:

  1. With the Pentaho BA server running, open a web browser and navigate to http://localhost:8080/pentaho/Login. This is the default BA server URL where the Tomcat server is running.
  2. The Pentaho User Console opens. Click Login as an Evaluator.
  3. Under Administrator, click Go.
  4. Log in using the username "admin" and the password "password".
  5. To set up a data source that points to your Vertica database, click the Manage Data Sources button.
  6. Click the gear and select New Connection.
  7. Name the connection and enter the following information to connect to your database:
    • Database type: Vertica 5+
    • Access: Native (JDBC)
    • Host Name: Vertica server name or IP address.
    • Database Name: Your database name.
    • Port Number: The default port is 5433.
    • User Name: Your Vertica user name.
    • Password : Your database password.

  8. To verify the connection to Vertica is successful, click Test.
  9. In the left panel, click Options.
  10. Enter advance connection properties to enable JDBC settings, such as session label and connection load balancing.

  11. Click OK to close the window. A new connection is listed.

Creating a Vertica Data Source

For Community and Enterprise Editions

After creating a connection and before creating your reports/dashboards, you need to specify a data source with the information you need for your analysis.

Follow these steps to create a Vertica data source:

  1. Log in to the Pentaho User Console using the browser http://localhost:8080/pentaho/Home.
  2. On the home page, click the Create New > Data Source. The data source wizard opens:
  3. Enter the following information:
    • Data Source Name: a name for your data source.
    • Source Type: Select either SQL Query to specify a custom query or Database Table(s) to select tables.

    If using a SQL Query, select your Vertica connection on the left panel and enter a custom query on the right panel as shown:

    If using Database Table(s), select the name of your Vertica connection from the list of connections and a Create data source for option as shown below. Click Next to select the tables and specify the joins between them.

  4. Click Finish to complete the process.

    The Data Source will be listed under Manage Data Sources. You can now use this Vertica data source from various components within the Pentaho Business Analytics Platform.

Creating Reports/Dashboards

Enterprise Edition

We created reports using Pentaho Dashboard, Analysis Report, and Report Designer.

Pentaho Dashboard

  1. Log into the Pentaho User Console in a browser http://localhost:8080.
  2. On the home page, click the Create New > Dashboard.
    The Dashboard environment page appears.
  3. Select Templates and Themes based on your requirements.
  4. Select the Insert content icon and select data table for tabular format results. Select the Data source that you created before.
  5. The Query Editor appears.
  6. Select Columns based on your input, click Preview, and then Click OK.
    You can now see the result on your screen.

Pentaho Analysis Report

  1. Log into the Pentaho User Console in a browser http://localhost:8080.
  2. On the home page, click Create New > Analysis Report.
    This lists all the available data sources.
  3. You need to select one of the data sources that you created for the report.
  4. In the Analysis home page, drag and drop the columns based on your input.
  5. Select the format from the top right corner based on your requirement.

Pentaho Report Designer

  1. From the Start menu, open Pentaho Report Designer.
  2. From the menu, select File > New to create a blank report.
  3. From the menu, select Data > Add Data Source > JDBC. The JDBC Data Source window appears.
  4. Under Connections click the ‘+’ icon to add a new Vertica connection. The Database Connection window appears.
  5. In the Database Connection window, in the General tab, enter the connection details:

    • Connection name: A name for your connection.
    • Connection type: Select Vertica 5+
    • Access: Select Native (JDBC).
    • Host Name: Vertica server name or IP address.
    • Database Name: Your database name.
    • Port Number: The default port is 5433.
    • User Name: Your Vertica user name.
    • Password: Your database password.

  6. In the Options tab, enter advance connection properties to enable JDBC settings, such as session label and connection load balancing.

  7. Click Test to test your connection and then click OK.
    The connection is listed under Connections in the JDBC Data Source window.

  8. In the JDBC Data Source window, in Available Queries click the ‘+’ icon to add a new query.

  9. Click OK to close the window.

  10. Create the report and publish it to Pentaho Report Server.

Community Edition

We created dashboards using the Community Dashboard Editor (CDE).

Community Dashboard Editor

Note Pentaho CDE can be configured to access data in Vertica in one of two ways, using a live data source or a cache data source. By default, Pentaho CDE caches the query results.

If the cache data source property is set to True, then the result of the query will be transfer out of Vertica and into the cache the first time the query gets executed. The data then will be stored for a period of time specified in the Cache Duration data source property. When the time expires the query is executed in Vertica and the cache is refreshed.

If the cache property is set to False (recommended), then Pentaho CDE will execute the query in Vertica each time you load or refresh your dashboard and so you will see the latest changes in your database.

  1. Log into the Pentaho User Console in a browser http://localhost:8080.
  2. Click Create New > CDE Dashboard.
  3. Click the Datasources Panel icon on the top right corner.
  4. Click SQL Queries in the left panel.
  5. Select either sql over sqljndi or sql over sqljdbc. For best results, choose sql over sqljndi.

    • If you choose sql over sqljndi, in the Properties dialog, enter the following:

      Name: Name your connection. In this example, sqljndi1.

      Jndi: Select the connection JNDI you created when connecting to Vertica as described in the section Connecting Pentaho Business Analytics to Vertica in this guide.

    • If you choose sql over sqljdbc, in the Properties dialog, enter the following:

      Name: Name your connection. In this example, sqljdbc1.

      Driver: com.vertica.jdbc.Driver

      Password:YourVerticadatabasepassword.

      User name: Your Vertica user name.

      URL: The JDBC connection URL, following the format: jdbc:vertica://my_vertica_ server:<my_vertica_port>/<my_database_name>

    • Following are common properties in both sql over sqljndi and sql over sqljdbc:

      Query: Your Vertica Query. This example uses the Average Inventory Overtime query listed in the preceding section.

      Cache Duration: The number of seconds the data is stored in the cache before the query gets executed again.

      Cache: The default value is True. When cache is True, Pentaho CDE uses the data stored in the cache to load the visualization. Set cache to False to execute the query in Vertica and ensure the data displayed in your charts are up to date with your data in Vertica.

  6. Click the Layouts Panel icon on the top right corner and select the layout based on your requirements.
  7. Click the Components Panel icon on the top right corner to create visualizations based on your needs.
  8. Save the dashboard and preview it.

Known Limitations

The following are data type known limitations for the Community and Enterprise Edition components:

Enterprise Edition

Pentaho Dashboard

  • For INTEGER data type, values are supported up to 16 digits.
  • For NUMERIC data type, values are supported up to 15 digits beyond which the value is rounded off.
  • For DOUBLE data type, values are truncated after the decimal point.
  • TIME, TIMESTAMP, TIMETZ, and TIMESTAMPTZ data types are not supported.
  • BINARY, VARBINARY, and LONG VARBINARY data types are displayed in an unknown format.

Pentaho Analysis Report

  • For NUMERIC data types, values are supported up to 308 digits beyond which an error is displayed.
  • For TIME and TIMETZ data types, milliseconds are not supported.
  • BINARY data type is not supported. Incorrect values are displayed. You can convert the values to hexadecimal to display correctly.

Pentaho Report Designer

  • Smallest DOUBLE value is not supported, 0 is displayed. While Previewing the data full value is visible.
  • For TIME and TIMESTAMP data types, milliseconds are not displayed.
  • For TIMETZ and TIMESTAMPTZ data types, milliseconds and timezone values are not displayed.
  • BINARY, VARBINARY, and LONG VARBINARY data types are not supported. It displays object instead of Value.

Community Edition

Community Dashboard Editor

  • For INTEGER and NUMERIC data types, values are displayed up to 16 digits.
  • BINARY, VARBINARY, and LONG VARBINARY data types are displayed in an unknown format.
  • For TIMESTAMPTZ data type, time zone offset is not supported.
  • For TIME and TIMETZ data types, milliseconds are not supported.

For More Information