Share this article:

Vertica Integration with IBM InfoSphere DataStage: Connection Guide

About Vertica Connection Guides

Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.

Vertica and DataStage: Latest Versions Tested

Software Version
Partner Product

IBM InfoSphere DataStage 11.5

Partner Product Platform

Window Server 2012 R2 (server and client) Standard

Vertica Client

Vertica JDBC and ODBC 9.1.1

Vertica Server Vertica Analytic Database 9.1.1

DataStage Overview

IBM InfoSphere DataStage is an ETL tool that integrates and transforms source system data into data marts and data warehouses.

The IBM InfoSphere DataStage client has three components:

  • Administrator client allows you to manage tasks for DataStage projects.
  • Designer client is the tool you use to design your DataStage jobs.
  • Director client manages the jobs in the DataStage engine.

Install DataStage

  1. Navigate to the IBM Support downloads page.
  2. Scroll down the page to the appropriate table and click the name of your operating system.

    Before starting the download, review the information provided.

  3. Download DataStage and the installation instructions.

To view the release notes for the tested version of IBM InfoSphere Information Server 11.5, go to the IBM Knowledge Center.

Install the Vertica Client Drivers

DataStage uses the Vertica JDBC and ODBC drivers to connect to Vertica. Follow these steps to install the Vertica client drivers:

  1. Navigate to the Client Drivers page on the Vertica website.

  2. Download the JDBC driver.

  3. Download the ODBC driver that is compatible with the architecture of your DataStage operating system and Vertica server version.

  4. Follow the installation instructions in the Vertica documentation:

  5. (JDBC only) Copy the Vertica JDBC .jar file to the DataStage directory for external libraries.
  6. (ODBC only) Create a DSN as described in Creating an ODBC Data Source Name (DSN).

Note  

For details about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

Connect to Vertica from DataStage

Before you configure your connection from DataStage to Vertica, make sure you are connected to IBM WebSphere Services.

ODBC Configuration on Windows

To test the ODBC driver:

  1. Open Control Panel > Administrative Tools.
  2. Open the ODBC Data Source Administrator for your system.
  3. Make sure your data source is configured to use the Vertica driver as shown.

  4. Double-click the connection.
  5. On the Vertica ODBC DSN Configuration page, click Test Connection.

ODBC Configuration on Linux

You need the following files to configure the ODBC driver on Linux:

  • dsenv: The DataStage environment file. The default location for this file is DSHOME. On Linux, DSHOME is /opt/IBM/InformationServer/Server/DSEngine.
  • odbc.ini: The ODBC driver manager file contains details related to your ODBC connection. The default locations for this file are DSHOME and IBM/InformationServer/Server/branded_odbc. At the DSHOME location, the odbc.ini file is a hidden file. The odbc.ini file needs to be configured at both locations.
  • uvodbc.config: Vertica file that contains information about the DSN and related DBMS. The default location for this file is DSHOME. Configure uvodbc.config for individual projects at the location DSHOME/Projects, as follows:

    <ds_src>
    DBMSTYPE = ODBC
     
    <ds_tgt>
    DBMSTYPE = ODBC

JDBC Configuration (Windows and Linux)

To configure the JDBC driver on Linux, you need the DataStage isjdbc.config file to establish the JDBC connection. It contains information about CLASSPATH and CLASS_NAMES:

CLASSPATH=/opt/vertica/vertica-jdbc-9.1.1-0.jar;
CLASS_NAMES=com.vertica.jdbc.Driver

Configure the Environment Variables for Your Project

Before you create a project in DataStage, set up your environment variables:

  1. On the Windows Apps screen, double-click IBM InfoSphere DataStage and QualityStage Administrator to open the Administrator Client:

  2. Enter your credentials in the Attach to DataStage window, and click Login.
  3. Select the Projects tab from the DataStage Administration screen.

  4. Select your project name and click Properties.

  5. Click Environment from the Project Properties screen.

  6. Set the DataStage environment variables as follows:


    Environment VariableCategory Setting
    APT_PM_SHOW_PIDSParallel > ReportingTrue
    APT_PM_PLAYER_TIMINGParallel > ReportingTrue
    APT_DEFAULT_TRANSPORT_BLOCK_SIZEParallel10000000
    APT_USE_IPV4ParallelTrue
    APT_PM_SHOWRSHParallel > ReportingTrue
    APT_DUMP_SCOREParallel > ReportingTrue
    APT_STARTUP_STATUSParallel > Reporting True
    APT_PMCONDUCTOR_HOSTNAMEParallelMachine name
    APT_IO_MAXIMUM_OUTSTANDINGParallel2097152
    APT_PM_CONDUCTOR_TIMEOUTUser-Defined320
    APT_PM_PLAYER_TIMEOUTUser-Defined120

Note For a full list of DataStage environment variables, see Environment Variables in the IBM Knowledge Center.

Configure the Designer Client

  1. Double-click IBM InfoSphere DataStage and QualityStage Designer on the Windows Apps screen to open the Designer Client:

  2. Enter your credentials in the Attach to Project screen and click Login.
  3. Select Import > Table Definitions > ODBC Table Definitions from the main Designer window.

Import the Data Definitions into DataStage

To define the source and target table definitions in DataStage, you can import the metadata for those objects.

  1. Select Import > Table Definitions > ODBC Table Definitions.

    The Import Meta Data window opens.

  2. On the Import Meta Data window, under DSN, check to make sure that the drop-down menu lists both your source and target databases.

  3. For both the source and target databases, perform the following:
    1. Select the database from the DSN drop-down menu.
    2. Click OK.

      DataStage verifies the ODBC configuration and lists the tables that you have defined on that database.

      Note In this example, the source and the target Meta Data are the same, so you only have to import the Meta Data from the source.

      The following image lists the target database tables.

      If the DSN entries appear in the drop-down list, the configuration of the odbc.ini and uvodbc.config files are correct. Otherwise, correct them in your Linux environment as described in ODBC Configuration on Linux .

    3. Click Close.

Note See the Datastage documentation to create, compile, and run jobs. When creating a job using ODBC or JDBC, you must create a PARALELL job with Vertica. In addition, you must select an ODBC or JDBC connector from the Database dropdown menu when connecting to Vertica.

Troubleshooting

This section describes issues you might encounter when you are trying to connect to Vertica from DataStage.

Empty DSN list

Explanation: When connecting to ODBC, the DSN drop-down list on the Import Meta Data page does not show the connection value.

Solution: Check the uvodbc.config file in the DSHOME/Projects directory. Depending on your operating system, add the following entries to the uvodbc.config file.

For Windows:

DSHOME=C:\IBM\InformationServer\Server\DSEngine for Windows

For Linux:

DSHOME=/opt/IBM/InformationServer/Server/DSEngine for Linux

Configuration file JDBC cannot be accessed

Explanation: This error occurs when you try to access the source or target database using JDBC.

Solution: Provide read access to the dsadm user, or give read access to all users. The isjdbc.config file should be located in the DSEngine direcctory

String Data Too Big Error

The text of this error is as follows:

target,0: ODBC function "SQLPutData()" reported:  SQLSTATE = 22001: Native Error Code = 10,170: Msg = [Vertica][ODBC] (10170) String data right truncation on data from data source: String data is too big for the driver's data buffer. (CC_OdbcLobSupport::writeLobData, file CC_OdbcLobSupport.cpp, line 261)

Explanation: If you are loading string data or similar data types with high precision, you might see the preceding message.

Solution: Set the value of the environment variable APT_DEFAULT_TRANSPORT_BLOCK_SIZE to 10000000. Note that this solution does not work for huge string sizes like 50000 characters.

ODBC Connector Array Size Error

The text of this error is as follows:

ODBC_Connector_0,0: The array size must be set to 1 when reading LOBs (CC_OdbcDBRecordDataSetProducer::dbsAllocateBindBuffers, file CC_OdbcDBRecordDataSetProducer.cpp, line 587)

Explanation: If you are loading string data or similar data types with high precision, you might see the preceding message.

Solution: In DataStage, change the array size value to 1.

Pxbridge Error

The text of this error is as follows:

main_program: Syntax error: Error in "pxbridge" operator: Error in output redirection: Error in output parameters: Error in modify adapter: Error in binding: Error parsing field type: Parsing parameters "1024,0" for schema type "decimal": Precision must be > 0 and <= 255, got: "1024"

Explanation: You see this error if you are trying to load decimal data or similar data types with a precision value greater than 255 but the data inside the column is less than 255 characters.

Solution: Change the precision value in the DDL to an integer between 1 and 255.

SQLGetPrivateProfileString Error

The text of this error is as follows:

[S1000][unixODBC][DSI] The error message NoSQLGetPrivateProfileString could not be found    in the en-US locale. Check that /en-US/ODBCMessages.xml exists.
[ISQL]ERROR: Could not SQLConnect.

Explanation: You might see this error in a Linux environment when you configure the ODBC driver. It is related to odbc.ini file configuration.

Solution: Usually ErrorMessagePath is set as: /opt/vertica/lib64. Add the en-US dir under the lib64 entry.

Data Type Support

The following list identifies known data type issues when connecting to Vertica from DataStage:

  • DataStage interprets the following Vertica data types as follows:

    Vertica Data Type SQL Type
    INTERVAL HOUR TO SECOND(0) 112
    INTERVAL HOUR TO SECOND 112
    INTERVAL HOUR TO MINUTE 111
    INTERVAL 110
    INTERVAL YEAR TO MONTH 107

    These data types are unique to Vertica. DataStage cannot parse these values.

  • Change in value for milliseconds when passing TIME data types while using JDBC connection.

    For the same data type in an ODBC connection, loss of milliseconds value occurs. For example, if the value is 14:30:45.666666, the Vertica JDBC connector reads and writes it as 14:30:45.384. The Vertica ODBC connector reads and writes it as 14:30:45.

  • TIMETZ and TIMESTAMPTZ values also change when passed from Vertica to DataStage.

For More Information

 

Share this article: