Creating Storage Locations

CREATE LOCATION lets you add and configure storage locations (other than the required defaults) to provide storage for these purposes:

  • Isolating execution engine temporary files from data files.
  • Creating labeled locations to use in storage policies.
  • Creating storage locations based on predicted or measured access patterns.
  • Creating USER storage locations for specific users or user groups.

While no technical issue prevents you from using CREATE LOCATION to add one or more Network File System (NFS) storage locations, Vertica does not support NFS data or catalog storage except for MapR mount points. You will be unable to run queries against any other NFS data. When creating locations on MapR file systems, you must specify ALL NODES SHARED.

You can add a new storage location from one node to another node or from a single node to all cluster nodes. However, do not use a shared directory on one node for other cluster nodes to access.

Planning Storage Locations

Adding a storage location requires minimal planning:

  1. Verify that the directory you plan to use for a storage location destination is an empty directory with write permissions for the Vertica process.
  2. Plan the labels to use if you want to label the location as you create it.
  3. Determine the type of information to store in the storage location:

    • DATA,TEMP (default): The storage location can store persistent and temporary DML-generated data, and data for temporary tables.
    • TEMP:  A path‑specified location to store DML-generated temporary data. If path is set to S3, then this location is used only when configuration parameter RemoteStorageForTemp is set to 1, and TEMP must be qualified with ALL NODES SHARED. For details, see S3 Storage of Temporary Data.
    • DATA: The storage location can only store persistent data.
    • USER: Users with READ and WRITE privileges can access data of this storage location on the local Linux file system, on S3 communal storage, and external tables.
    • DEPOT: The storage location is used in Eon Mode to store the depot. Only create DEPOT storage locations on local Linux filesystems.

      Vertica allows a single DEPOT storage location per node. If you want to move your depot to different location (on a different file system, for example) you must first drop the old depot storage location, then create the new location.

    Storing temp and data files in different storage locations is advantageous because the two types of data have different disk I/O access patterns. Temp files are distributed across locations based on available storage space. However, data files can be stored on different storage locations, based on storage policy, to reflect predicted or measured access patterns.

If you plan to place storage locations on HDFS, see Using HDFS Storage Locations in Integrating with Apache Hadoop for additional requirements.

Creating Unlabeled Local Storage Locations

This example shows a three-node cluster, each with a vertica/SSD directory for storage.

On each node in the cluster, create a directory where the node stores its data. For example:

$ mkdir /home/dbadmin/vertica/SSD

Vertica recommends that you create the same directory path on each node. Use this path when creating a storage location.

Use the CREATE LOCATION statement to add a storage location. Specify the following information:

  • The path on the node where Vertica stores the data.
  • The node where the location is available, or ALL NODES.

    Specify the node or use the ALL NODES keyword. Otherwise, the statement creates the storage locations on all nodes in the cluster in a single transaction.

  • The type of information to be stored.

For user access (non-dbadmin users), you must create the storage location with the USER usage type. You cannot change an existing storage location to have USER access. After you create a USER storage location, you can grant one or more users access to it. User areas can store only data files, not temp files. You cannot assign a USER storage location to a storage policy.

The following example shows how to add a location available on all nodes to store only data:

CREATE LOCATION '/home/dbadmin/vertica/SSD/' ALL NODES USAGE 'DATA';

The following example shows how to add a location that is available on node v_vmart_node0001 to store data and temporary files:

CREATE LOCATION '/home/dbadmin/vertica/SSD/' NODE 'v_vmart_node0001';

Suppose you are using a storage location for data files and want to create ranked storage locations. In this ranking, columns are stored on different disks based on their measured performance. To create ranked storage locations, see the following sections:

  1. Measuring Storage Performance
  2. Setting Storage Performance

After you create a storage location, you can alter the type of information it stores, with some restrictions. See Altering Location Use.

Storage Location Subdirectories

You cannot create a storage location in a subdirectory of an existing location. Doing so results in an error similar to the following:

=> CREATE LOCATION '/tmp/myloc' ALL NODES USAGE 'TEMP';
CREATE LOCATION
=> CREATE LOCATION '/tmp/myloc/ssd' ALL NODES USAGE 'TEMP';
ERROR 5615:  Location [/tmp/myloc/ssd] conflicts with existing location
[/tmp/myloc] on node v_vmart_node0001

Creating Labeled Storage Locations

You can add a storage location with a descriptive label using the CREATE LOCATION statement's LABEL keyword. You use labeled locations to set up storage policies for your site. See Creating Storage Policies.

This example shows how to create a storage location on v_mart_node0002 with the label SSD:

=> CREATE LOCATION '/home/dbadmin/SSD/schemas' NODE 'v_vmart_node0002' 
   USAGE 'DATA' LABEL 'SSD';

This example shows you how to create a storage location on all nodes. Specifying the ALL NODES keyword adds the storage location to all nodes in a single transaction:

=> CREATE LOCATION '/home/dbadmin/SSD/schemas' ALL NODES 
   USAGE 'DATA' LABEL 'SSD';

The new storage location is listed in the v_monitor.disk_storage system table:

=> SELECT * FROM v_monitor.disk_storage;
.
.
-[ RECORD 7 ]-----------+-----------------------------------------------------
node_name               | v_vmart_node0002
storage_path            | /home/dbadmin/SSD/schemas
storage_usage           | DATA
rank                    | 0
throughput              | 0
latency                 | 0
storage_status          | Active
disk_block_size_bytes   | 4096
disk_space_used_blocks  | 1549437
disk_space_used_mb      | 6053
disk_space_free_blocks  | 13380004
disk_space_free_mb      | 52265
disk_space_free_percent | 89%
…

Creating a Storage Location for USER Access

You can create USER storage locations for a non-dbadmin user to access the storage location after being granted appropriate privileges.

The following example shows how to create a storage location, BillyBobStore, with a USER usage argument, on node v_mcdb_node0007:

=> CREATE LOCATION '/home/dbadmin/UserStorage/BillyBobStore' NODE 
   'v_mcdb_node0007' USAGE 'USER';

The following example shows how to grant a user BillyBob read and write permissions to the BillyBobStore location:

=> GRANT ALL ON LOCATION '/home/dbadmin/UserStorage/BillyBobStore' TO BillyBob;
GRANT PRIVILEGE

For more information about configuring user privileges, see Database Users and Privileges in the Administrator's Guide and the GRANT (Storage Location) and REVOKE (Storage Location) functions in the SQL Reference Manual.

Shared Versus Local Storage

The SHARED keyword indicates that the location set by the path argument is shared by all nodes. Most remote file systems such as HDFS and S3 are shared. For these file systems, the path argument represents a single location where all nodes store data. Each node creates its own subdirectory in the shared storage location for its own files. Doing so prevents one node from overwriting files that belong to other nodes.

If using a remote file system, you must specify SHARED, even for one-node clusters. If the location is declared as USER, Vertica does not create subdirectories for each node. The setting of USER takes precedence over SHARED.

If you create a location and omit this keyword, the new storage location is local. In this case, path specifies a location that is unique for each node in the cluster. This location is usually a path in the node's own file system. Storage locations in file systems that are local to each node, such as Linux, are always local.

S3 Storage of Temporary Data

If you are using Vertica in Eon mode and have limited local disk space, that space might be insufficient to handle the large quantities of temporary data that some DML operations are liable to generate. This is especially true for large load operations and refresh operations.

You can leverage S3 storage to handle temporary data, as follows:

  1. Create a remote storage location with CREATE LOCATION, where path is set to S3 as follows:
    CREATE LOCATION 's3://bucket/path' ALL NODES SHARED USAGE 'TEMP';
  2. Set session configuration parameter RemoteStorageForTemp to 1:
    ALTER SESSION SET RemoteStorageForTemp= 1;

    A temporary storage location must already exist on S3 before you set this parameter to 1; otherwise, Vertica throws an error and hint to create the storage location.

  3. Run the queries that require extra temporary storage.
  4. Reset RemoteStorageForTemp to its default value:
    ALTER SESSION DEFAULT CLEAR RemoteStorageForTemp;

When you set RemoteStorageForTemp, Vertica redirects temporary data for all DML operations to the specified remote location. The parameter setting remains in effect until it is explicitly reset to its default value (0), or the current session ends.

Redirecting temporary data to S3 is liable to affect performance and require extra S3 API calls. Use it only for DML operations that involve large quantities of data.