Using the HCatalog Connector

The Vertica HCatalog Connector lets you access data stored in Apache's Hive data warehouse software the same way you access it within a native Vertica table.

If your files are in the Optimized Columnar Row (ORC) or Parquet format and do not use complex types, the HCatalog Connector creates an external table and uses the ORC or Parquet reader instead of using the Java SerDe. See Reading Hadoop Columnar File Formats for more information about these readers.

The HCatalog Connector performs predicate pushdown to improve query performance. Instead of reading all data across the network to evaluate a query, the HCatalog Connector moves the evaluation of predicates closer to the data. Predicate pushdown applies to Hive partition pruning, ORC stripe pruning, and Parquet row-group pruning. The HCatalog Connector supports predicate pushdown for the following predicates: >, >=, =, <>, <=, <.

Hive, HCatalog, and HiveServer2 Overview

There are several Hadoop components that you need to understand to use the HCatalog connector:

The Vertica HCatalog Connector lets you transparently access data that is available through HiveServer2. You use the connector to define a schema in Vertica that corresponds to a Hive database or schema. When you query data within this schema, the HCatalog Connector transparently extracts and formats the data from Hadoop into tabular data.

Note: You can use the WebHCat service instead of HiveServer2, but performance is usually better with HiveServer2. To use WebHCat, set the HCatalogConnectorUseHiveServer2 configuration parameter to 0. See Hadoop Parameters.

HCatalog Connection Features

The HCatalog Connector lets you query data stored in Hive using the Vertica native SQL syntax. Some of its main features are:

HCatalog Connector Considerations

There are a few things to keep in mind when using the HCatalog Connector:

Note: The HCatalog Connector is read-only. It cannot insert data into Hive.

In This Section