HDFS ConnectorTroubleshooting Tips

The following sections explain some of the common issues you may encounter when using the HDFS Connector.

User Unable to Connect to Kerberos-Authenticated Hadoop Cluster

A user may suddenly be unable to connect to Hadoop through the connector in a Kerberos-enabled environment. This issue can be caused by someone exporting a new keytab file for the user, which invalidates existing keytab files. You can determine if invalid keytab files is the problem by comparing the key version number associated with the user's principal key in Kerberos with the key version number stored in the keytab file on the Vertica cluster.

To find the key version number for a user in Kerberos:

  1. From the Linux command line, start the kadmin utility (kadmin.local if you are logged into the Kerberos Key Distribution Center). Run the getprinc command for the user:

    $ sudo kadmin
    [sudo] password for dbadmin:
    Authenticating as principal root/admin@MYCOMPANY.COM with password.
    Password for root/admin@MYCOMPANY.COM:
    kadmin:  getprinc exampleuser@MYCOMPANY.COM
    Principal: exampleuser@MYCOMPANY.COM
    Expiration date: [never]
    Last password change: Fri Jul 26 09:40:44 EDT 2013
    Password expiration date: [none]
    Maximum ticket life: 1 day 00:00:00
    Maximum renewable life: 0 days 00:00:00
    Last modified: Fri Jul 26 09:40:44 EDT 2013 (root/admin@MYCOMPANY.COM)
    Last successful authentication: [never]
    Last failed authentication: [never]
    Failed password attempts: 0
    Number of keys: 2
    Key: vno 3, des3-cbc-sha1, no salt
    Key: vno 3, des-cbc-crc, no salt
    MKey: vno 0
    Attributes:
    Policy: [none]
    

    In the preceding example, there are two keys stored for the user, both of which are at version number (vno) 3.

  2. To get the version numbers of the keys stored in the keytab file, use the klist command:

    $ sudo klist -ek exampleuser.keytab
    Keytab name: FILE:exampleuser.keytab
    KVNO Principal
    ---- ----------------------------------------------------------------------
       2 exampleuser@MYCOMPANY.COM (des3-cbc-sha1)
       2 exampleuser@MYCOMPANY.COM (des-cbc-crc)
       3 exampleuser@MYCOMPANY.COM (des3-cbc-sha1)
       3 exampleuser@MYCOMPANY.COM (des-cbc-crc)

    The first column in the output lists the key version number. In the preceding example, the keytab includes both key versions 2 and 3, so the keytab file can be used to authenticate the user with Kerberos.

Resolving Error 5118

When using the connector, you might receive an error message similar to the following:

ERROR 5118: UDL specified no execution nodes; at least one execution node must be specified

To correct this error, verify that all of the nodes in your Vertica cluster have the correct version of the HDFS Connector package installed. This error can occur if one or more of the nodes do not have the supporting libraries installed. These libraries may be missing because one of the nodes was skipped when initially installing the connector package. Another possibility is that one or more nodes have been added since the connector was installed.

Transfer Rate Errors

The HDFS Connector monitors how quickly Hadoop sends data to Vertica.In some cases, the data transfer speed on any connection between a node in your Hadoop cluster and a node in your Vertica cluster slows beyond a lower limit (by default, 1 MB per second). When the transfer slows beyond the lower limit, the connector breaks the data transfer. It then connects to another node in the Hadoop cluster that contains the data it was retrieving. If it cannot find another node in the Hadoop cluster to supply the data (or has already tried all of the nodes in the Hadoop cluster), the connector terminates the COPY statement and returns an error.

=> COPY messages SOURCE Hdfs(url='http://hadoop.example.com:50070/webhdfs/v1/tmp/data.txt', 
   username='exampleuser');
ERROR 3399:  Failure in UDx RPC call InvokeProcessUDL(): Error calling processUDL() 
in User Defined Object [Hdfs] at [src/Hdfs.cpp:275], error code: 0, 
message: [Transferring rate during last 60 seconds is 172655 byte/s, below threshold 1048576 byte/s, give up. 
The last error message: Operation too slow. Less than 1048576 bytes/sec transferred the last 1 seconds.
The URL: http://hadoop.example.com:50070/webhdfs/v1/tmp/data.txt?op=OPEN&offset=154901544&length=113533912.
The redirected URL: http://hadoop.example.com:50075/webhdfs/v1/tmp/data.txt?op=OPEN&
namenoderpcaddress=hadoop.example.com:8020&length=113533912&offset=154901544.]

If you encounter this error, troubleshoot the connection between your Vertica and Hadoop clusters. If there are no problems with the network, determine if either your Hadoop cluster or Vertica cluster is overloaded. If the nodes in either cluster are too busy, they may not be able to maintain the minimum data transfer rate.

If you cannot resolve the issue causing the slow transfer rate, you can lower the minimum acceptable speed. To do so, set the low_speed_limit parameter for the Hdfs source. The following example shows how to set low_speed_limit to 524288 to accept transfer rates as low as 512 KB per second (half the default lower limit).

=> COPY messages SOURCE Hdfs(url='http://hadoop.example.com:50070/webhdfs/v1/tmp/data.txt',
   username='exampleuser', low_speed_limit=524288);
Rows Loaded
-------------
     9891287
(1 row)

When you lower the low_speed_limit parameter, the COPY statement loading data from HDFS may take a long time to complete.

You can also increase the low_speed_limit setting if the network between your Hadoop cluster and Vertica cluster is fast. You can choose to increase the lower limit to force COPY statements to generate an error, if they are running more slowly than they normally should, given the speed of the network.

Error Loading Many Files

When using the HDFS Connector to load many data files in a single statement, you might receive an error message similar to the following:

RROR 3399: Failure in UDx RPC call InvokePlanUDL(): 
Error calling planUDL() in User Defined Object [Hdfs] at [src/Glob.cpp:531], 
error code: 0, message: 
Error occurs in Glob::stat: 
Last error message before give up: Failed to connect to 10.20.41.212: 
Cannot assign requested address.

This can happen when concurrent load requests overwhelm the Name Node. It is generally safe to load hundreds of files at a time, but if you load thousands you might see this error. Use smaller batches of files to avoid this error.