S3 Object Store
File systems using the S3 protocol, including AWS, Pure Storage, and MinIO.
URI Format
s3://bucket/path
For AWS, specify the region in the AWSRegion configuration parameter, not the URI. If the region is not correct, you might experience a delay before the load fails because Vertica retries several times before giving up. The default region is us-east-1.
Authentication
For AWS:
- To access S3 you must create an IAM role and grant that role permission to access your S3 resources. For more information about IAM roles, see Amazon Web Services documentation.
-
By default, bucket access is restricted to the communal storage bucket. Use an AWS access key to load data from non-communal storage buckets.
-
Set the AWSAuth configuration parameter to provide credentials.
-
You can use AWS STS temporary session tokens to load data. Because they are session tokens, do not use them for access to storage locations.
To allow users without superuser privileges to access data in S3, you must create a USER storage location for the S3 path (see CREATE LOCATION) and grant users access.
Configuration Parameters
The following database configuration parameters apply to the S3 file system. Some are specific to AWS. You can set parameters globally and for the current session with ALTER DATABASE…SET PARAMETER and ALTER SESSION…SET PARAMETER, respectively. For more information about these parameters, see S3 Parameters.
Parameter | Description |
---|---|
AWSAuth |
An ID and secret key for authentication. AWS calls these AccessKeyID and SecretAccessKey. For extra security, do not store credentials in the database; use ALTER SESSION…SET PARAMETER to set this value for the current session only. To use admintools |
AWSCAFile |
The file name of the TLS server certificate bundle to use. You must set a value when installing a CA certificate on a SUSE Linux Enterprise Server. |
AWSCAPath |
The path Vertica uses to look up TLS server certificates. You must set a value when installing a CA certificate on a SUSE Linux Enterprise Server. |
AWSEnableHttps |
Whether to use the HTTPS protocol when connecting to S3. Can be set only at the database level with ALTER DATABASE…SET PARAMETER. Default: 1 (enabled) |
AWSEndpoint |
The endpoint host for all S3 URLs, set as follows:
If not set, Vertica uses virtual-hosted request URLs. To use admintools Default: |
AWSLogLevel |
The log level, one of the following:
|
AWSRegion |
The AWS region containing the S3 bucket from which to read files. This parameter can only be configured with one region at a time. If you need to access buckets in multiple regions, change the parameter each time you change regions. Failing to set the correct region can lead to a delay before queries fail. Default: us-east-1 |
AWSSessionToken |
A temporary security token generated by running the If you use session tokens, you must set all parameters at the session level, even if some of them are set at the database level. Use ALTER SESSION to set session parameters. |
S3EnableVirtualAddressing |
Whether to rewrite S3 URLs to use virtual-hosted paths (disabled by default). This configuration setting takes effect only when you have specified a value for AWSEndpoint. The value of this parameter does not affect how you specify S3 paths. As of September 30, 2020, AWS requires virtual address paths for newly created buckets. |
AWSStreamingConnectionPercentage |
In Eon Mode, the number of connections to the communal storage to use for streaming reads. In a cloud environment, this setting helps prevent streaming data from using up all available file handles. This setting is unnecessary when using on-premises object stores because of their lower latency. |
Examples
The following example sets a database-wide AWS region.
=> ALTER SESSION SET AWSRegion='us-west-1';
The following example loads data from S3. You can use a glob if all files in the glob can be loaded together. In the following example, AWS_DataLake contains only ORC files.
=> COPY t FROM 's3://datalake/*' ORC;
You can specify a list of comma-separated S3 buckets as in the following example. All buckets must be in the same region. To load from more than one region, use separate COPY statements and change the value of AWSRegion between calls.
=> COPY t FROM 's3://AWS_Data_1/sales.parquet', 's3://AWS_Data_2/sales.parquet' PARQUET;
The following example creates a user storage location and a role, so that users without superuser privileges can read data from S3.
=> CREATE LOCATION 's3://datalake' SHARED USAGE 'USER' LABEL 's3user'; => CREATE ROLE ExtUsers; --- Assign users to this role using GRANT (Role). => GRANT READ ON LOCATION 's3://datalake' TO ExtUsers;
The following example sets an STS temporary session token.
$ aws sts get-session-token { "Credentials": { "AccessKeyId": "ASIAJZQNDVS727EHDHOQ", "SecretAccessKey": "F+xnpkHbst6UPorlLGj/ilJhO5J2n3Yo7Mp4vYvd", "SessionToken": "FQoDYXdzEKv//////////wEaDMWKxakEkCyuDH0UjyKsAe6/3REgW5VbWtpuYyVvSnEK1jzGPHi/jPOPNT7Kd+ftSnD3qdaQ7j28SUW9YYbD50lcXikz/HPlusPuX9sAJJb7w5oiwdg+ZasIS/+ejFgCzLeNE3kDAzLxKKsunvwuo7EhTTyqmlLkLtIWu9zFykzrR+3Tl76X7EUMOaoL31HOYsVEL5d9I9KInF0gE12ZB1yN16MsQVxpSCavOFHQsj/05zbxOQ4o0erY1gU=", "Expiration": "2018-07-18T05:56:33Z" } } $ vsql => ALTER SESSION SET AWSAuth = 'ASIAJZQNDVS727EHDHOQ:F+xnpkHbst6UPorlLGj/ilJhO5J2n3Yo7Mp4vYvd'; => ALTER SESSION SET AWSSessionToken = 'FQoDYXdzEKv//////////wEaDMWKxakEkCyuDH0UjyKsAe6/3REgW5VbWtpuYyVvSnEK1jzGPHi/jPOPNT7Kd+ftSnD3qdaQ7j28SUW9YYbD50lcXikz/HPlusPuX9sAJJb7w5oiwdg+ZasIS/+ejFgCzLeNE3kDAzLxKKsunvwuo7EhTTyqmlLkLtIWu9zFykzrR+3Tl76X7EUMOaoL31HOYsVEL5d9I9KInF0gE12ZB1yN16MsQVxpSCavOFHQsj/05zbxOQ4o0erY1gU=';