Requirements for HDFS Storage Locations

Caution:

If you use any HDFS storage locations, the HDFS data must be available at the time you start Vertica. Your HDFS cluster must be operational, and the ROS files must be present. If you have moved data files, or if they have become corrupted, or if your HDFS cluster is not responsive, Vertica cannot start.

To store Vertica's data on HDFS, verify that:

HDFS Space Requirements

If your Vertica database is K-safe, HDFS-based storage locations contain two copies of the data you store in them. One copy is the primary projection, and the other is the buddy projection. If you have enabled HDFS's data-redundancy feature, Hadoop stores both projections multiple times. This duplication might seem excessive. However, it is similar to how a RAID level 1 or higher stores redundant copies of both the primary and buddy projections. The redundant copies also help the performance of HDFS by enabling multiple nodes to process a request for a file.

Verify that your HDFS installation has sufficient space available for redundant storage of both the primary and buddy projections of your K-safe data. You can adjust the number of duplicates stored by HDFS by setting the HadoopFSReplication configuration parameter. See Troubleshooting HDFS Storage Locations for details.

Additional Requirements for Backing Up Data Stored on HDFS

To back up your data stored in HDFS storage locations, your Hadoop cluster must have snapshotting enabled for the directories to be used for backups. The easiest way to do this is to give the database administrator's account superuser privileges in Hadoop, so that snapshotting can be set automatically. Alternatively, use Hadoop to enable snapshotting for each directory before using it for backups.

In addition, your Vertica database must:

Caution: After you have created an HDFS storage location, full database backups will fail with the error message:

ERROR 5127:  Unable to create snapshot No such file /usr/bin/hadoop: 
check the HadoopHome configuration parameter

This error is caused by the backup script not being able to back up the HDFS storage locations. You must configure Vertica and Hadoop to enable the backup script to back these locations. After you configure Vertica and Hadoop, you can once again perform full database backups.

See Backing Up HDFS Storage Locations for details on configuring your Vertica and Hadoop clusters to enable HDFS storage location backup.

Best Practices for SQL on Apache Hadoop

If you are using the Vertica for SQL on Apache Hadoop product, OpenText recommends the following best practices for storage locations:

Generally, HDFS requires approximately 2 GB of memory for each node in the cluster. To support this requirement in your Vertica configuration:

  1. Create a 2-GB resource pool.
  2. Do not assign any Vertica execution resources to this pool. This approach reserves the space for use by HDFS.

Alternatively, use Ambari or Cloudera Manager to find the maximum heap size required by HDFS and set the size of the resource pool to that value.

For more about how to configure resource pools, see Managing Workloads.