Troubleshooting Reads from the hdfs Scheme

You might encounter the following issues when using the hdfs URL scheme to access data in HDFS.

WebHDFS Error When Using hdfs URLs

When creating an external table or loading data and using the hdfs scheme, you might see errors from WebHDFS failures. Such errors indicate that Vertica was not able to use the hdfs scheme and fell back to webhdfs, but that the WebHDFS configuration is incorrect. Verify that the HDFS configuration files in HadoopConfDir have the correct WebHDFS configuration for your Hadoop cluster. See Configuring the hdfs Scheme for information about use of these files. See your Hadoop documentation for information about WebHDFS configuration.

Queries Take a Long Time to Run When Using HA

The High Availability Name Node feature in HDFS allows a name node to fail over to a standby name node. The dfs.client.failover.max.attempts configuration parameter (in hdfs-site.xml) specifies how many attempts to make when failing over. Vertica uses a default value of 4 if this parameter is not set. After reaching the maximum number of failover attemps, Vertica concludes that the HDFS cluster is unavailable and aborts the operation. A second parameter, ipc.client.connect.retry.interval, specifies the time to wait between attempts, with typical values being 10 to 20 seconds.

Cloudera and Hortonworks both provide tools to automatically generate configuration files. These tools can set the maximum number of failover attempts to a much higher number (50 or 100). If the HDFS cluster is unavailable (all name nodes are unreachable), Vertica can appear to hang for an extended period (minutes to hours) while trying to connect.

Failover attempts are logged in the QUERY_EVENTS system table. The following example shows how to query this table to find these events:

=> SELECT event_category, event_type, event_description, operator_name, 
   event_details, count(event_type) AS count 
   FROM query_events 
   WHERE event_type ilike 'LibHDFS++ FAILOVER RETRY' 
   GROUP BY event_category, event_type, event_description, operator_name, event_details;
-[ RECORD 1 ]-----+---------------------------------------
event_category    | EXECUTION
event_type        | LibHDFS++ FAILOVER RETRY
event_description | LibHDFS++ Namenode failover and retry.
operator_name     | LibHDFS++ FileSystem
event_details     | Libhdfs++ request failed on ns
count             | 4		

You can either wait for Vertica to complete or abort the connection, or set the dfs.client.failover.max.attempts parameter to a lower value.