Vertica Integration with IBM InfoSphere DataStage: Connection Guide

About Vertica Connection Guides

Vertica connection guides provide basic instructions for connecting a third-party partner product to Vertica. Connection guides are based on our testing with specific versions of Vertica and the partner product.

Vertica and DataStage: Latest Versions Tested

Software Version
Partner Product

IBM InfoSphere DataStage 11.7.1

Partner Product Platform

Windows Server 2016 Standard (DataStage server and client)

CentOS Linux 7.6.1810 (DataStage server)

Vertica Client

Vertica JDBC and ODBC 12.0.1

Vertica Server Vertica Analytic Database 12.0.1

DataStage Overview

IBM InfoSphere DataStage is an ETL tool that integrates and transforms source system data into data marts and data warehouses.

The IBM InfoSphere DataStage client has three components:

  • Administrator client allows you to manage tasks for DataStage projects.
  • Designer client is the tool you use to design your DataStage jobs.
  • Director client manages the jobs in the DataStage engine.
  • Metadata Asset Manager allows you to import table definitions using JDBC.

Installing DataStage

  1. Navigate to the IBM Support downloads page.
  2. Scroll down the page to the appropriate table and click the name of your operating system.

    Before starting the download, review the information provided.

  3. Download DataStage and the installation instructions.

To view the release notes for the tested version of IBM InfoSphere Information Server 11.7, go to the IBM Knowledge Center.

Installing the Vertica Client Drivers

DataStage uses the Vertica JDBC and ODBC drivers to connect to Vertica. Follow these steps to install the Vertica Client Drivers:

  1. Navigate to the Client Drivers page on the Vertica website.

  2. Download the JDBC driver package.

  3. Download the ODBC driver that is compatible with the architecture of your DataStage operating system and Vertica server version.

    Note  

    For details about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

  4. Follow the installation instructions in the Vertica documentation:

  5. (JDBC only) Copy the Vertica JDBC .jar file to the DataStage directory for external libraries.
  6. (ODBC only) Create a DSN as described in Creating an ODBC Data Source Name (DSN).

Configuring Environment Variables for Your Project

Before you create a project in DataStage, set up the Vertica related environment variables:

  1. On the Windows Apps screen, double-click IBM InfoSphere DataStage and QualityStage Administrator to open the Administrator Client:

  2. Enter your credentials in the Attach to DataStage window, and click Login.
  3. Select the Projects tab from the DataStage Administration screen.

  4. Select your project name and click Properties.

  5. Click Environment from the Project Properties screen.

  6. Set the DataStage environment variables as follows:

    Environment VariableCategory Setting
    APT_PM_SHOW_PIDSParallel > ReportingTrue
    APT_PM_PLAYER_TIMINGParallel > ReportingTrue
    APT_DEFAULT_TRANSPORT_BLOCK_SIZEParallel10000000
    APT_USE_IPV4ParallelTrue
    APT_PM_SHOWRSHParallel > ReportingTrue
    APT_DUMP_SCOREParallel > ReportingTrue
    APT_STARTUP_STATUSParallel > Reporting True
    APT_PMCONDUCTOR_HOSTNAMEParallelMachine name
    APT_IO_MAXIMUM_OUTSTANDINGParallel2097152
    APT_PM_CONDUCTOR_TIMEOUTUser-Defined320
    APT_PM_PLAYER_TIMEOUTUser-Defined120

Note For a full list of DataStage environment variables, see Environment Variables in the IBM Knowledge Center.

Configuring the Designer Client

  1. Double-click IBM InfoSphere DataStage and QualityStage Designer on the Windows Apps screen to open the Designer Client:

  2. Enter your credentials in the Attach to Project screen and click Login

Connecting DataStage to Vertica

Before you configure your connection from DataStage to Vertica, ensure that you are connected to IBM WebSphere Services.

Configuring ODBC on Windows

To test the ODBC driver:

  1. Open Control Panel > Administrative Tools.
  2. Open the ODBC Data Source Administrator (32-bit) for your system system as DataStage Client uses 32-bit ODBC DSNs.
  3. Ensure that your data source is configured to use the Vertica driver as shown.

  4. Double-click the connection.
  5. On the Vertica ODBC DSN Configuration page, click Test Connection.

Configuring ODBC on Linux

You need the following files to configure the ODBC driver on Linux:

  • Vertica.ini: Create a configuration file named vertica.ini recommended in DSHOME directory /opt/IBM/InformationServer/Server/DSEngine. This file defines the Vertica specific settings required by the ODBC drivers. Here’s an example vertica.ini file:

    [VerticaDriverName]
    ODBCInstLib=/opt/IBM/InformationServer/Server/branded_odbc/lib/libodbcinst.so
    ErrorMessagesPath=/opt/vertica/en-US
    LogPath=/tmp
    LogLevel=6
    DriverManagerEncoding=UTF-16
  • dsenv: The DataStage environment file. The default location for this file is DSHOME. Any environment variables you need for interactive use of ODBC drivers such as VERTICAINI, ODBCINI, PATH or LD_LIBRARY_PATH to make a connection to an ODBC data source must be added to the dsenv file.
  • odbc.ini: The ODBC driver manager file contains details related to your ODBC connection. The default locations for this file are DSHOME and IBM/InformationServer/Server/branded_odbc. At the DSHOME location, the odbc.ini file is a hidden file. The odbc.ini file needs to be configured at both locations.

    [Driver]

    DriverManagerEncoding=UTF-16

    ODBCInstLib=/opt/IBM/InformationServer/Server/branded_odbc/lib/libodbcinst.so

    Trace=1

    TraceOptions=1

    TraceDll=/opt/IBM/InformationServer/Server/branded_odbc/lib/VMtrc00.so

    TraceFile=/tmp/odbctrace.out

    ErrorMessagesPath=/opt/vertica/en-US

    LogLevel=3

    LogPath=/opt/vertica/logs

     

    [Source_DSN]

    Driver = /opt/vertica/lib64/libverticaodbc.so

    Host = Host_Name

    Server = Vertica_Source_Database_Host_IP

    ServerName = Vertica_Source_Database_Host_Name

    Database = Vertica_Database_Name

    UserName = Vertica_Database_UserName

    Password = Vertica_Database_Password

    UID = Vertica_Database_UID

    Port = 5433

    SuppressWarnings = 0

    TxnReadCommitted = 1

    Locale = en_US

     

    [Target_DSN]

    Driver = /opt/vertica/lib64/libverticaodbc.so

    Host = Host_Name

    Server = Vertica_Target_Database_Host_IP

    ServerName = Vertica_Target_Database_Host_Name

    Database = Vertica_Database_Name

    UserName = Vertica_Database_UserName

    Password = Vertica_Database_Password

    UID = Vertica_Database_UID

    Port = 5433

    Locale = en_US

  • uvodbc.config: Vertica file that contains information about the DSN and related DBMS. The default location for this file is DSHOME/Projects. Configure uvodbc.config for individual projects at the location DSHOME/Projects, as follows:

    <ds_src>
    DBMSTYPE = ODBC
     
    <ds_tgt>
    DBMSTYPE = ODBC

Importing Vertica Data Definitions into DataStage using ODBC

    To define the source and target table definitions in DataStage, you can import the metadata for those objects.

  1. Select Import > Table Definitions > ODBC Table Definitions in DataStage client.

    The Import Meta Data (ODBC) window appears.

  2. Select DSN name from the DSN drop-down and provide the database credentials to connect. Click OK.

  3. Note Under DSN, ensure the drop-down menu lists both your source and target database DSNs.

    The Import Meta Data window opens.

  4. For both the source and target databases, perform the following:
    1. Select DSN from the drop-down menu.
    2. Enter credentials and click OK.
    3. Select the required tables and click Import.

      DataStage verifies the ODBC configuration and lists the tables that you have defined on that database.

      Note In this example, the source and the target Meta Data are the same, so you only have to import the Meta Data from the source.

      The following image lists the target database tables.

      If the DSN entries appear in the drop-down list, the configuration of the odbc.ini and uvodbc.config files are correct. Otherwise, correct them in your Linux environment as described in Configuring ODBC on Linux .

Configuring JDBC (Windows and Linux)

To configure JDBC driver on Windows and Linux, you need the DataStage isjdbc.config file located in DSHOME directory. This file contains information about CLASSPATH and CLASS_NAMES.

On Windows, DSHOME is C:\IBM\InformationServer\Server\DSEngine

On Linux DSHOME is /opt/IBM/InformationServer/Server/DSEngine.

CLASSPATH=/opt/vertica/vertica-jdbc-12.0.1-0.jar;
CLASS_NAMES=com.vertica.jdbc.Driver

Importing Vertica Data Definitions into DataStage using JDBC

  1. In a web browser, type https or http://server-IP:port/ibm/iis/imam/console to open the Infosphere MetaData Asset Manager.

  2. Enter the credentials and click Login.
  3. Click Import, and then click New Import Area.

    The Create New Import Area dialog box appears.

    Provide the following details:

    • Import area name: Provide the asset import area name that suits the business requirements.

    • Import area description (Optional): Provide a description of the Asset Import Area.

    • MetaData interchange server: Select the server machine from the drop-down list.

    • Select a Bridge or Connector: Select JDBC Connector in the IBM folder.

  4. Click Next.
  5. Click Select Data Connection icon adjacent to the Data connection box and click New Data Connection.

    Name: Type in the name of the JDBC data connection.

    URL: Type the Vertica JDBC URL in the following format:

    jdbc:vertica://hostname or IP:5433/db_name

    • Type the database User name and Password

  6. Click Test Connection.
  7. Click OK after a successful connection.
  8. Click the required check boxes to include tables and views. Provide a schema name or table name filter, if required.

  9. Click the Select 'Assets to Import' icon adjacent to Assets to import box.

  10. Select the required assets, click OK and then click Next.

  11. Select Host system name and database by clicking the Select existing asset icons and click Next.

  12. Provide an Import Description and select Express Import or Managed Import and then click Import.

  13. The newly imported assets are displayed in the Import tab of Metadata Asset Manager.

  14. In the DataStage Designer Client, click Repository > Metadata Sharing > Create Table Definition from shared Table

  15. Select the imported asset that you want to create a table definition from.

  16. Click Create.

    The required table definition is displayed in the Table Definitions folder in the navigation pane.

  17. To import target table definitions, repeat the steps above.

    Note In this example, the source and target meta data are the same, so you only need to import meta data from the source.

For more information about creating, compiling, and running jobs, see DataStage documentation. When creating a job using ODBC or JDBC, you must create a PARALLEL job with Vertica. In addition, you must select an ODBC or JDBC connector from the Database drop-down menu when connecting to Vertica.

Troubleshooting

You may encounter some issues when connecting to Vertica using IBM DataStage. This topic describes known issues and workarounds.

Empty DSN List

Explanation: When connecting to ODBC, the DSN drop-down list on the Import Meta Data page does not show the connection value.

Solution: Check the uvodbc.config file in the DSHOME/Projects directory. Depending on your operating system, add the following entries to the uvodbc.config file.

For Windows:

DSHOME=C:\IBM\InformationServer\Server\DSEngine for Windows

For Linux:

DSHOME=/opt/IBM/InformationServer/Server/DSEngine for Linux

Configuration File JDBC Cannot be accessed

Explanation: This error occurs when you try to access the source or target database using JDBC.

Solution: Provide read access to the dsadm user, or give read access to all users. The isjdbc.config file should be located in the DSEngine directory.

String Data Too Big

The error is as follows:

target,0: ODBC function "SQLPutData()" reported:  SQLSTATE = 22001: Native Error Code = 10,170: Msg = [Vertica][ODBC] (10170) String data right truncation on data from data source: String data is too big for the driver's data buffer. (CC_OdbcLobSupport::writeLobData, file CC_OdbcLobSupport.cpp, line 261)

Explanation: You might see this error message when loading string data or similar data types with high precision.

Solution: Set the value of the environment variable APT_DEFAULT_TRANSPORT_BLOCK_SIZE to 10000000.

ODBC Connector Array Size

The error is as follows:

ODBC_Connector_0,0: The array size must be set to 1 when reading LOBs (CC_OdbcDBRecordDataSetProducer::dbsAllocateBindBuffers, file CC_OdbcDBRecordDataSetProducer.cpp, line 587)

Explanation: You might see this error message when loading string data or similar data types with high precision.

Solution: In DataStage, change the array size value to 1 in ODBC stages.

Pxbridge

The error is as follows:

main_program: Syntax error: Error in "pxbridge" operator: Error in output redirection: Error in output parameters: Error in modify adapter: Error in binding: Error parsing field type: Parsing parameters "1024,0" for schema type "decimal": Precision must be > 0 and <= 255, got: "1024"

Explanation: You see this error when you load decimal data or similar data types with a precision value greater than 255.

Solution: Change the precision value in the DDL to an integer between 1 and 255.

SQLGetPrivateProfileString

The error is as follows:

[S1000][unixODBC][DSI] The error message NoSQLGetPrivateProfileString could not be found    in the en-US locale. Check that /en-US/ODBCMessages.xml exists.
[ISQL]ERROR: Could not SQLConnect.

Explanation: You might see this error in a Linux environment when you configure the ODBC driver. It is related to odbc.ini file configuration.

Solution: Usually ErrorMessagePath is set as: /opt/vertica/lib64. Add the en-US dir under the lib64 entry.

Known Limitations

Before using DataStage, be aware of the following known limitations:

ODBC Limitations

  • DataStage interprets the following data types as follows during job compilation:

    Vertica Data Type SQL Type
    INTERVAL HOUR TO SECOND 112
    INTERVAL HOUR TO MINUTE 111
    INTERVAL DAY TO SECOND 110
    INTERVAL YEAR TO MONTH 107
    UUID -11
  • TIME data type does not display milliseconds.

  • TIMESTAMP data type does not display milliseconds on Windows and Linux environments for Write operations only.

  • TIMETZ and TIMESTAMPTZ data types do not display milliseconds and timezone offset.

  • LONG VARBINARY data type supports up to 2 million column length value beyond which the “the record is too big to fit in a block” error is displayed.

  • LONG VARCHAR data type beyond 1 million column length may display “the record is too big to fit in a block” error.

JDBC Limitations

  • For INTERVAL HOUR TO SECOND(0), INTERVAL HOUR TO SECOND, INTERVAL HOUR TO MINUTE, INTERVAL DAY TO SECOND, and INTERVAL YEAR TO MONTH data types, the error "Driver Not Capable" appears when executing jobs.
  • TIME, TIMETZ, and TIMESTAMPTZ data types do not display milliseconds.
  • LONG VARBINARY data type supports up to 2 million column length value beyond which the “the record is too big to fit in a block” error is displayed.

  • LONG VARCHAR data type beyond 1 million column length may display “the record is too big to fit in a block” error.

For More Information