Creating Storage Locations

You can use CREATE LOCATION to add and configure storage locations (other than the required defaults) to provide storage for these purposes:

  • Isolating execution engine temporary files from data files.
  • Creating labeled locations to use in storage policies.
  • Creating storage locations based on predicted or measured access patterns.
  • Creating USER storage locations for specific users or user groups.

While no technical issue prevents you from using CREATE LOCATION to add one or more Network File System (NFS) storage locations, Vertica does not support NFS data or catalog storage except for MapR mount points. You will be unable to run queries against any other NFS data. When creating locations on MapR file systems, you must specify ALL NODES SHARED.

You can add a new storage location from one node to another node or from a single node to all cluster nodes. However, do not use a shared directory on one node for other cluster nodes to access.

Planning Storage Locations

Before adding a storage location, perform the following steps:

  1. Verify that the directory you plan to use for a storage location destination is an empty directory with write permissions for the Vertica process.
  2. Plan the labels to use if you want to label the location as you create it.
  3. Determine the type of information to store in the storage location:

    • DATA,TEMP (default): The storage location can store persistent and temporary DML-generated data, and data for temporary tables.
    • TEMP: A path-specified location to store DML-generated temporary data. If path is set to S3, then this location is used only when the RemoteStorageForTemp configuration parameter is set to 1, and TEMP must be qualified with ALL NODES SHARED. For details, see S3 Storage of Temporary Data in Eon Mode.
    • DATA: The storage location can only store persistent data.
    • USER: Users with READ and WRITE privileges can access data and external tables of this storage location.
    • DEPOT: The storage location is used in Eon Mode to store the depot. Only create DEPOT storage locations on local Linux file systems.

      Vertica allows a single DEPOT storage location per node. If you want to move your depot to different location (on a different file system, for example) you must first drop the old depot storage location, then create the new location.

    Storing temp and data files in different storage locations is advantageous because the two types of data have different disk I/O access patterns. Temp files are distributed across locations based on available storage space. However, data files can be stored on different storage locations, based on storage policy, to reflect predicted or measured access patterns.

If you plan to place storage locations on HDFS, see Requirements for HDFS Storage Locations for additional requirements.

Creating Unlabeled Local Storage Locations

This example shows a three-node cluster, each with a vertica/SSD directory for storage.

On each node in the cluster, create a directory where the node stores its data. For example:

$ mkdir /home/dbadmin/vertica/SSD

Vertica recommends that you create the same directory path on each node. Use this path when creating a storage location.

Use the CREATE LOCATION statement to add a storage location. Specify the following information:

  • The path on the node where Vertica stores the data.

    Vertica does not validate the path that you specify. Confirm that the path value points to a valid location.

  • The node where the location is available, or ALL NODES. If you specify ALL NODES, the statement creates the storage locations on all nodes in the cluster in a single transaction.

  • The type of information to be stored.

To give unprivileged (non-dbadmin) Linux users access to data, you must create a USER storage location. You can also use a USER storage location to give users without their own credentials access to shared file systems and object stores like HDFS and S3. See Creating a Storage Location for USER Access.

The following example shows how to add a location available on all nodes to store only data:

=> CREATE LOCATION '/home/dbadmin/vertica/SSD/' ALL NODES USAGE 'DATA';

The following example shows how to add a location that is available on the v_vmart_node0001 node to store data and temporary files:

=> CREATE LOCATION '/home/dbadmin/vertica/SSD/' NODE 'v_vmart_node0001';

Suppose you are using a storage location for data files and want to create ranked storage locations. In this ranking, columns are stored on different disks based on their measured performance. To create ranked storage locations, see Measuring Storage Performance and Setting Storage Performance.

After you create a storage location, you can alter the type of information it stores, with some restrictions. See Altering Location Use.

Storage Location Subdirectories

You cannot create a storage location in a subdirectory of an existing storage location. Doing so results in an error similar to the following:

ERROR 5615:  Location [/tmp/myloc/ssd] conflicts with existing location
[/tmp/myloc] on node v_vmart_node0001

Creating Labeled Storage Locations

You can add a storage location with a descriptive label using the CREATE LOCATION statement's LABEL keyword. You use labeled locations to set up storage policies. See Creating Storage Policies.

This example shows how to create a storage location on v_mart_node0002 with the label SSD:

=> CREATE LOCATION '/home/dbadmin/SSD/schemas' NODE 'v_vmart_node0002' 

This example shows you how to create a storage location on all nodes. Specifying the ALL NODES keyword adds the storage location to all nodes in a single transaction:

=> CREATE LOCATION '/home/dbadmin/SSD/schemas' ALL NODES 

The new storage location is listed in the DISK_STORAGE system table:

=> SELECT * FROM v_monitor.disk_storage;
-[ RECORD 7 ]-----------+-----------------------------------------------------
node_name               | v_vmart_node0002
storage_path            | /home/dbadmin/SSD/schemas
storage_usage           | DATA
rank                    | 0
throughput              | 0
latency                 | 0
storage_status          | Active
disk_block_size_bytes   | 4096
disk_space_used_blocks  | 1549437
disk_space_used_mb      | 6053
disk_space_free_blocks  | 13380004
disk_space_free_mb      | 52265
disk_space_free_percent | 89%

Creating a Storage Location for USER Access

To give unprivileged (non-dbadmin) Linux users access to data, you must create a USER storage location.

By default, Vertica uses user-provided credentials to access external file systems such as HDFS and cloud object stores. You can override this default and create a USER storage location to manage access to these locations. To override the default, set the UseServerIdentityOverUserIdentity configuration parameter.

After you create a USER storage location, you can grant one or more users access to it. USER storage locations grant access only to data files, not temp files. You cannot assign a USER storage location to a storage policy. You cannot change an existing storage location to have USER access.

The following example shows how to create a USER storage location on a specific node:

=> CREATE LOCATION '/home/dbadmin/UserStorage/BobStore' NODE 
   'v_mcdb_node0007' USAGE 'USER';

The following example shows how to grant a specific user read and write permissions to the location:

=> GRANT ALL ON LOCATION '/home/dbadmin/UserStorage/BobStore' TO Bob;

The following example shows how to use a USER storage location to grant access to locations on S3. Vertica uses the server credential to access the location:

   --- set database-level credential (once):
=> ALTER DATABASE DEFAULT SET AWSAuth = 'myaccesskeyid123456:mysecretaccesskey123456789012345678901234';
=> CREATE LOCATION 's3://datalake' SHARED USAGE 'USER' LABEL 's3user';

=> CREATE ROLE ExtUsers;						
   --- Assign users to this role using GRANT (Role).
=> GRANT READ ON LOCATION 's3://datalake' TO ExtUsers;						

For more information about configuring user privileges, see Database Users and Privileges and the GRANT (Storage Location) and REVOKE (Storage Location) reference pages.

Shared Versus Local Storage

The SHARED keyword indicates that the location is shared by all nodes. Most remote file systems such as HDFS and S3 are shared. For these file systems, the path argument represents a single location in the remote file system where all nodes store data. Each node creates its own subdirectory in the shared storage location for its own files. Doing so prevents one node from overwriting files that belong to other nodes.

If using a remote file system, you must specify SHARED, even for one-node clusters. If the location is declared as USER, Vertica does not create subdirectories for each node. The setting of USER takes precedence over SHARED.

If you create a location and omit this keyword, the new storage location is treated as local. Each node must have unique access to the specified path. This location is usually a path in the node's own file system. Storage locations in file systems that are local to each node, such as Linux, are always local.

SHARED DATA storage locations are deprecated.

S3 Storage of Temporary Data in Eon Mode

If you are using Vertica in Eon Mode and have limited local disk space, that space might be insufficient to handle the large quantities of temporary data that some DML operations can generate. This is especially true for large load operations and refresh operations.

You can leverage S3 storage to handle temporary data, as follows:

  1. Create a remote storage location with CREATE LOCATION, where path is set to S3 as follows:
  2. Set the RemoteStorageForTemp session configuration parameter to 1:
    => ALTER SESSION SET RemoteStorageForTemp= 1;

    A temporary storage location must already exist on S3 before you set this parameter to 1; otherwise, Vertica throws an error and hint to create the storage location.

  3. Run the queries that require extra temporary storage.
  4. Reset RemoteStorageForTemp to its default value:
    => ALTER SESSION DEFAULT CLEAR RemoteStorageForTemp;

When you set RemoteStorageForTemp, Vertica redirects temporary data for all DML operations to the specified remote location. The parameter setting remains in effect until it is explicitly reset to its default value (0), or the current session ends.

Redirecting temporary data to S3 is liable to affect performance and require extra S3 API calls. Use it only for DML operations that involve large quantities of data.