Apache Hadoop Parameters

The following table describes the general parameters for configuring integration with Apache Hadoop. See Integrating with Apache Hadoop for more information.

Parameter Description
EnableHDFSBlockInfoCache

Whether to distribute block location metadata collected during planning on the initiator to all database nodes for execution. Distributing this metadata reduces NameNode accesses, and thus load, but can degrade database performance somewhat in deployments where the NameNode isn't contended. This performance effect is because the data must be serialized and distributed. Enable distribution if protecting the NameNode is more important than query performance; usually this applies to large HDFS clusters where NameNode contention is already an issue.

Default Value:0 (disabled)

Requires Restart: No

HadoopConfDir

A directory path containing the XML configuration files copied from Hadoop. The same path must be valid on every Vertica node. You can use the VERIFY_HADOOP_CONF_DIR meta-function to test that the value is set correctly. Setting this parameter is required to read data from HDFS.

For all Vertica users, the files are accessed by the Linux user under which the Vertica server process runs.

When you set this parameter, any previously-cached configuration information is flushed.

You can set this parameter at the session level. Doing so overrides the database value; it does not append to it. To append, get the current value and include it on the new path after your additions. Setting this parameter at the session level does not change how the files are accessed.

Default Value: obtained from environment if possible

Requires Restart: No

Example:

=> ALTER DATABASE mydb SET HadoopConfDir = '/hadoop/hcat/conf';
=> SHOW DATABASE mydb HadoopConfDir;
     name      |      setting
---------------+-------------------
 HadoopConfDir | /hadoop/hcat/conf
(1 row)

=> ALTER SESSION SET HadoopConfDir='/test/conf:/hadoop/hcat/conf';
HadoopImpersonationConfig A session parameter specifying the delegation token or Hadoop user for HDFS access. See HadoopImpersonationConfig Format for information about the value of this parameter and Proxy Users and Delegation Tokens for more general context.
HDFSUseWebHDFS

Whether to use the webhdfs scheme instead of hdfs, regardless of the URL. Using webhdfs is slower than using hdfs but supports some additional features. If you do not specifically need a feature not supported in the hdfs scheme, you should not change the value of this parameter.

Default Value: 0 (disabled)

Requires Restart: No

The following table describes the parameters for configuring the HCatalog Connector. See Using the HCatalog Connector in Integrating with Apache Hadoop for more information.

Parameter Description
EnableHCatImpersonation

Whether the HCatalog Connector uses (impersonates) the current Vertica user when accessing Hive. If impersonation is enabled, the HCatalog Connector uses the Kerberos credentials of the logged-in Vertica user to access Hive data. Disable impersonation if you are using an authorization service to manage access without also granting users access to the underlying files. For more information, see Configuring Security in Integrating with Apache Hadoop.

Default Value: 1 (enabled)

Requires Restart: No

Example:

=> ALTER DATABASE mydb SET EnableHCatImpersonation = 0;
HCatalogConnectorUseHiveServer2

When enabled, Vertica internally uses HiveServer2 instead of WebHCat to get metadata from Hive.

Default Value: 1 (enabled)

Requires Restart: No

Example:

=> ALTER DATABASE mydb SET HCatalogConnectorUseHiveServer2 = 0;
HCatalogConnectorUseLibHDFSPP

Whether the HCatalog Connector should use the hdfs scheme instead of webhdfs to read native formats.

This parameter is deprecated. Vertica uses the hdfs scheme by default. If you need to use webhdfs, use the HDFSUseWebHDFS parameter.

Default Value: 1 (enabled)

HCatConnectionTimeout

The number of seconds the HCatalog Connector waits for a successful connection to the HiveServer2 (or WebHCat) server before returning a timeout error.

Default Value: 0 (Wait indefinitely)

Requires Restart: No

Example:

=> ALTER DATABASE mydb SET HCatConnectionTimeout = 30;
HCatSlowTransferLimit

The lowest transfer speed (in bytes per second) that the HCatalog Connector allows when retrieving data from the HiveServer2 (or WebHCat) server. In some cases, the data transfer rate from the server to Vertica is below this threshold. In such cases, after the number of seconds specified in the HCatSlowTransferTime parameter pass, the HCatalog Connector cancels the query and closes the connection.

Default Value: 65536

Requires Restart: No

Example: 

=> ALTER DATABASE mydb SET HCatSlowTransferLimit = 32000;
HCatSlowTransferTime

The number of seconds the HCatalog Connector waits before testing whether the data transfer from the server is too slow. See the HCatSlowTransferLimit parameter.

Default Value: 60

Requires Restart: No

Example:

=> ALTER DATABASE mydb SET HCatSlowTransferTime = 90;

You can override the HCatalog configuration parameters when creating an HCatalog schema. See CREATE HCATALOG SCHEMA in the SQL Reference Manual for an explanation.