Creating an HDFS Storage Location
Use the CREATE LOCATION statement to create an HDFS storage location. Make the following changes from creating local storage locations:
- For the path, use the WebHDFS URI for the HDFS directory where you want Vertica to store the location's data,. This URI is the same as a standard HDFS URL, except it uses the
webhdfs://
protocol and its path does not start with/webhdfs/v1/
. - Include the ALL NODES SHARED keywords, as all HDFS storage locations are shared storage. This is required even if you have only one HDFS node in your cluster.
Caution:
If you use any HDFS storage locations, the HDFS data must be available at the time you start Vertica. Your HDFS cluster must be operational, and the ROS files must be present. If you have moved data files, or if they have become corrupted, or if your HDFS cluster is not responsive, Vertica cannot start.
Creating the Storage Location
To create an HDFS storage location, first create the location on all nodes and then set its storage policy to HDFS. To create the location in HDFS on all nodes:
=> CREATE LOCATION 'webhdfs://hadoop:50070/user/dbadmin' ALL NODES SHARED USAGE 'data' LABEL 'coldstorage';
Next, set the storage policy for your database objects to use this location:
=> SELECT SET_OBJECT_STORAGE_POLICY('SchemaName','coldstorage');
This causes all data in the named schema to be written to the HDFS storage location (coldestorage) instead of the local disk. You can set storage policies for a schema, a table, a partition, or the entire database.
For more information, see Managing Storage Locations.
Adding HDFS Storage Locations to New Nodes
If you add nodes to your Vertica cluster, they do not automatically have access to existing HDFS storage locations. You must manually create the storage location for the new node using the CREATE LOCATION statement. Do not use the ALL NODES keyword in this statement. Instead, use the NODE keyword with the name of the new node to tell Vertica that just that node needs to add the shared location.
Caution: You must manually create the storage location. Otherwise, the new node uses the default storage policy (usually, storage on the local Linux filesystem) to store data that the other the nodes store in HDFS. As a result, the node can run out of disk space.
The following example shows how to add the storage location from the preceding example to a new node named v_vmart_node0004:
=> CREATE LOCATION 'webhdfs://hadoop:50070/user/dbadmin' NODE 'v_vmart_node0004' SHARED USAGE 'data' LABEL 'coldstorage';
Any active standby nodes in your cluster when you create an HDFS-based storage location automatically create their own instances of the location. When the standby node takes over for a down node, it uses its own instance of the location to store data for objects using the HDFS-based storage policy. Treat standby nodes added after you create the storage location as any other new node. You must manually define the HDFS storage location.