Apache Hadoop Parameters

The following table describes general parameters for configuring integration with Apache Hadoop. See Integrating with Apache Hadoop for more information.

Query system table CONFIGURATION_PARAMETERS to determine what levels (node, session, user, database) are valid for a given parameter.

Parameter Description
EnableHDFSBlockInfoCache

Boolean, specifies whether to distribute block location metadata collected during planning on the initiator to all database nodes for execution. Distributing this metadata reduces NameNode accesses, and thus load, but can degrade database performance somewhat in deployments where the NameNode isn't contended. This performance effect is because the data must be serialized and distributed. Enable distribution if protecting the NameNode is more important than query performance; usually this applies to large HDFS clusters where NameNode contention is already an issue.

Default: 0 (disabled)

HadoopConfDir

Directory path containing the XML configuration files copied from Hadoop. The same path must be valid on every Vertica node. You can use the VERIFY_HADOOP_CONF_DIR meta-function to test that the value is set correctly. Setting this parameter is required to read data from HDFS.

For all Vertica users, the files are accessed by the Linux user under which the Vertica server process runs.

When you set this parameter, previously-cached configuration information is flushed.

You can set this parameter at the session level. Doing so overrides the database value; it does not append to it. For example:

=> ALTER SESSION SET HadoopConfDir='/test/conf:/hadoop/hcat/conf';

To append, get the current value and include it on the new path after your additions. Setting this parameter at the session level does not change how the files are accessed.

Default: obtained from environment if possible

HadoopImpersonationConfig Session parameter specifying the delegation token or Hadoop user for HDFS access. See HadoopImpersonationConfig Format for information about the value of this parameter and Proxy Users and Delegation Tokens for more general context.
HDFSUseWebHDFS

Boolean, specifies whether to use the webhdfs scheme instead of hdfs, regardless of the URL. Using webhdfs is slower than using hdfs but supports some additional features. If you do not specifically need a feature not supported in the hdfs scheme, you should not change the value of this parameter.

Default: 0 (disabled)

HCatalog Connector Parameters

The following table describes the parameters for configuring the HCatalog Connector. See Using the HCatalog Connector in Integrating with Apache Hadoop for more information.

You can override HCatalog configuration parameters when you create an HCatalog schema, with CREATE HCATALOG SCHEMA.

Parameter Description
EnableHCatImpersonation

Boolean, specifies whether the HCatalog Connector uses (impersonates) the current Vertica user when accessing Hive. If impersonation is enabled, the HCatalog Connector uses the Kerberos credentials of the logged-in Vertica user to access Hive data. Disable impersonation if you are using an authorization service to manage access without also granting users access to the underlying files. For more information, see Configuring Security in Integrating with Apache Hadoop.

Default: 1 (enabled)

HCatalogConnectorUseHiveServer2

Boolean, specifies whether Vertica internally uses HiveServer2 instead of WebHCat to get metadata from Hive.

Default: 1 (enabled)

HCatalogConnectorUseLibHDFSPP

Boolean, specifies whether the HCatalog Connector should use the hdfs scheme instead of webhdfs to read native formats.

This parameter is deprecated. Vertica uses the hdfs scheme by default. If you need to use webhdfs, use the HDFSUseWebHDFS parameter.

Default: 1 (enabled)

HCatConnectionTimeout

The number of seconds the HCatalog Connector waits for a successful connection to the HiveServer2 (or WebHCat) server before returning a timeout error.

Default: 0 (Wait indefinitely)

HCatSlowTransferLimit

Lowest transfer speed (in bytes per second) that the HCatalog Connector allows when retrieving data from the HiveServer2 (or WebHCat) server. In some cases, the data transfer rate from the server to Vertica is below this threshold. In such cases, after the number of seconds specified in the HCatSlowTransferTime parameter pass, the HCatalog Connector cancels the query and closes the connection.

Default: 65536

HCatSlowTransferTime

Number of seconds the HCatalog Connector waits before testing whether the data transfer from the server is too slow. See the HCatSlowTransferLimit parameter.

Default: 60