Backing Up HDFS Storage Locations

It is important to secure backup locations and strictly limit access to backups to users who are already permitted to access all data in the database. Compromising a backup means compromising the database.

Vertica recommends that you regularly back up the data in your Vertica database. This recommendation includes data stored in your HDFS storage locations. The Vertica backup script (vbr) can back up HDFS storage locations. However, you must perform several configuration steps before it can back up these locations.

After you have created an HDFS storage location, full database backups will fail with the error message:

ERROR 5127:  Unable to create snapshot No such file /usr/bin/hadoop: 
check the HadoopHome configuration parameter

This error is caused by the backup script not being able to back up the HDFS storage locations. You must configure Vertica and Hadoop to enable the backup script to back up these locations. After you configure Vertica and Hadoop, you can once again perform full database backups.

There are several considerations for backing up HDFS storage locations in your database:

  • HDFS storage locations do not support object-level backups. You must perform a full database backup to back up the data in your HDFS storage locations.
  • Data in an HDFS storage location is backed up to HDFS. This backup guards against accidental deletion or corruption of data. It does not prevent data loss in the case of a catastrophic failure of the entire Hadoop cluster. To prevent data loss, you must have a backup and disaster recovery plan for your Hadoop cluster.

    Data stored on the Linux native file system is still backed up to the location you specify in the backup configuration file. It and the data in HDFS storage locations are handled separately by the vbr backup script.

  • You must configure your Vertica cluster to restore database backups containing an HDFS storage location. See Configuring Vertica to Restore HDFS Storage Locations for the configuration steps you must take.
  • The HDFS directory for the storage location must have snapshotting enabled. You can either directly configure this yourself or enable the database administrator’s Hadoop account to do it for you automatically. See Configuring Hadoop and Vertica to Enable Backup of HDFS Storage for more information.

The topics in this section explain the configuration steps you must take to enable the backup of HDFS storage locations.

In This Section