Apache Hadoop Integration
This section contains information on updates to Hadoop-integration information for Vertica Analytic Database 8.1.x.
Exporting Parquet Data
You can export data from Vertica, either to share it with other Hadoop-based applications or to move lower-priority data from ROS to less-expensive storage. The EXPORT TO PARQUET statement exports a result set as Parquet data. After exporting ROS data, you can drop affected ROS partitions to reclaim storage space. If you need to access the data in Vertica again, you can create external tables from the exported data.
For more information, see Exporting Data
HCatalog Connector Uses HiveServer2
Previously the HCatalog Connector read Hive metadata using the WebHCat web service, which has poor performance. The HCatalog Connector now supports HiveServer2, which has a faster JDBC interface, and uses it by default. (You can still use WebHCat if you prefer it.)
When using CREATE HCATALOG SCHEMA to create a schema in Vertica to mirror a Hive schema, you can now specify an additional parameter, HIVESERVER2_HOSTNAME. This statement has an additional (optional) parameter, WEBHDFS_ADDRESS. The WEBHDFS_PORT parameter has been removed.
For more information, see Using the HCatalog Connector.
Cloudera Manager Integration
The Cloudera distribution of Hadoop includes Cloudera Manager, a web-based tool for managing a Hadoop cluster. Cloudera Manager can manage any service for which a service description is available. This release includes a service description for Vertica. After installing this service description, you can manage your Vertica cluster from the Cloudera Manager console. You can start, stop, and add nodes and manage memory pools.
For more information, see Integrating With Cloudera Manager.
hdfs Scheme Fully Supports High Availability Name Node
Previous versions of Vertica added support for using the hdfs
URL scheme in HDFS clusters with High Availability Name Node (HA NN). Reads and writes using the hdfs
scheme can sometimes fall back to using webhdfs
, which does not support HA NN. Vertica now supports HA NN when falling back to webhdfs
from hdfs
. For more information, see the description of the LibHDFS++ FAILOVER RETRY
event in Check Query Events in Analyzing Data.