Removing HDFS Storage Locations

The steps to remove an HDFS storage location are similar to standard storage locations:

  1. Remove any existing data from the HDFS storage location by using SET_OBJECT_STORAGE_POLICY to change each object's storage location. Alternatively, you can use CLEAR_OBJECT_STORAGE_POLICY. Because the Tuple Mover runs infrequently, set the enforce-storage-move parameter to true to make the change immediately.
  2. Retire the location on each host that has the storage location defined using RETIRE_LOCATION. Set enforce-storage-move to true.
  3. Drop the location on each node using DROP_LOCATION.
  4. Optionally remove the snapshots and files from the HDFS directory for the storage location.
  5. Perform a full database backup.

For more information about changing storage policies, changing usage, retiring locations, and dropping locations, see Managing Storage Locations.

If you have backed up the data in the HDFS storage location you are removing, you must perform a full database backup after you remove the location. If you do not and restore the database to a backup made before you removed the location, the location's data is restored.

Removing Storage Location Files from HDFS

Dropping an HDFS storage location does not automatically clean the HDFS directory that stored the location's files. Any snapshots of the data files created when backing up the location are also not deleted. These files consume disk space on HDFS and also prevent the directory from being reused as an HDFS storage location. Vertica cannot create a storage location in a directory that contains existing files or subdirectories.

You must log into the Hadoop cluster to delete the files from HDFS. An alternative is to use some other HDFS file management tool.

Removing Backup Snapshots

HDFS returns an error if you attempt to remove a directory that has snapshots:

$ hdfs dfs -rm -r -f -skipTrash /user/dbadmin/v_vmart_node0001
rm: The directory /user/dbadmin/v_vmart_node0001 cannot be deleted since 
/user/dbadmin/v_vmart_node0001 is snapshottable and already has snapshots

The Vertica backup script creates snapshots of HDFS storage locations as part of the backup process. If you made backups of your HDFS storage location, you must delete the snapshots before removing the directories.

HDFS stores snapshots in a subdirectory named .snapshot. You can list the snapshots in the directory using the standard HDFS ls command:

$ hdfs dfs -ls /user/dbadmin/v_vmart_node0001/.snapshot
Found 1 items
drwxrwx---   - dbadmin supergroup          0 2014-09-02 10:13 /user/dbadmin/v_vmart_node0001/.snapshot/s20140902-101358.629

To remove snapshots, use the command:

$ hdfs dfs -removeSnapshot directory snapshotname

The following example demonstrates the command to delete the snapshot shown in the previous example:

$ hdfs dfs -deleteSnapshot /user/dbadmin/v_vmart_node0001 s20140902-101358.629

You must delete each snapshot from the directory for each host in the cluster. After you have deleted the snapshots, you can delete the directories in the storage location.

Each snapshot's name is based on a timestamp down to the millisecond. Nodes independently create their own snapshots. They do not synchronize snapshot creation, so their snapshot names differ. You must list each node's snapshot directory to learn the names of the snapshots it contains.

See Apache's HDFS Snapshot documentation for more information about managing and removing snapshots.

Removing the Storage Location Directories

You can remove the directories that held the storage location's data by either of the following methods:

  • Use an HDFS file manager to delete directories. See your Hadoop distribution's documentation to determine if it provides a file manager.
  • Log into the Hadoop Name Node using the database administrator’s account and use HDFS's rmr command to delete the directories. See Apache's File System Shell Guide for more information.

The following example uses the HDFS rmr command from the Linux command line to delete the directories left behind in the HDFS storage location directory /user/dbamin. It uses the -skipTrash flag to force the immediate deletion of the files:

$ hdfsp dfs -ls /user/dbadmin
Found 3 items
drwxrwx---   - dbadmin supergroup          0 2014-08-29 15:11 /user/dbadmin/v_vmart_node0001
drwxrwx---   - dbadmin supergroup          0 2014-08-29 15:11 /user/dbadmin/v_vmart_node0002
drwxrwx---   - dbadmin supergroup          0 2014-08-29 15:11 /user/dbadmin/v_vmart_node0003

$ hdfs dfs -rmr -skipTrash /user/dbadmin/*
Deleted /user/dbadmin/v_vmart_node0001
Deleted /user/dbadmin/v_vmart_node0002
Deleted /user/dbadmin/v_vmart_node0003