S3 Object Store

File systems using the S3 protocol, including AWS, Pure Storage, and MinIO.

URI Format

s3://bucket/path

For AWS, specify the region in the AWSRegion configuration parameter, not the URI. If the region is incorrect, you might experience a delay before the load fails because Vertica retries several times before giving up. The default region is us-east-1.

Authentication

For AWS:

  • To access S3 you must create an IAM role and grant that role permission to access your S3 resources.
  • By default, bucket access is restricted to the communal storage bucket. Use an AWS access key to load data from non-communal storage buckets.
  • Either set the AWSAuth configuration parameter to provide credentials or create a USER storage location for the S3 path (see CREATE LOCATION) and grant users access.
  • You can use AWS STS temporary session tokens to load data. Because they are session tokens, do not use them for access to storage locations.
  • You can configure S3 buckets individually with the per-bucket parameters S3BucketConfig and S3BucketCredentials. For details, see Per-Bucket S3 Configurations.

Configuration Parameters

The following database configuration parameters apply to the S3 file system. You can set parameters at different levels with the appropriate ALTER statement, such as ALTER SESSION...SET PARAMETER. Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter. For details about these parameters, see S3 Parameters.

Parameter Description
AWSAuth

An ID and secret key for authentication. AWS calls these AccessKeyID and SecretAccessKey. For extra security, do not store credentials in the database; use ALTER SESSION...SET PARAMETER to set this value for the current session only.

To use admintoolscreate_db or revive_db for Eon Mode on-premises, set this parameter in the auth_params.conf configuration file.

AWSCAFile

The file name of the TLS server certificate bundle to use. You must set a value when installing a CA certificate on a SUSE Linux Enterprise Server.

AWSCAPath

The path Vertica uses to look up TLS server certificates. You must set a value when installing a CA certificate on a SUSE Linux Enterprise Server.

AWSEnableHttps

Boolean, specifies whether to use the HTTPS protocol when connecting to S3. Can be set only at the database level with ALTER DATABASE...SET PARAMETER.

Default: 1 (enabled)

AWSEndpoint

The endpoint host for all S3 URLs, set as follows:

  • AWS: hostname_or_IP:port. Do not include the scheme (http(s)).
  • AWS with a FIPS-compliant S3 Endpoint: Hostname of a FIPS-compliant S3 endpoint. You must also enable S3EnableVirtualAddressing.
  • On-premises/Pure: IP address of the Pure Storage server.

If not set, Vertica uses virtual-hosted request URLs.

To use admintoolscreate_db or revive_db for Eon Mode on-premises, set this parameter in the auth_params.conf configuration file.

Default:s3.amazonaws.com

AWSLogLevel

The log level, one of the following:

  • OFF
  • FATAL
  • ERROR (default)
  • WARN
  • INFO
  • DEBUG
  • TRACE
AWSRegion

The AWS region containing the S3 bucket from which to read files. This parameter can only be configured with one region at a time. If you need to access buckets in multiple regions, change the parameter each time you change regions.

Failure to set the correct region can lead to a delay before queries fail.

Default: us-east-1

AWSSessionToken

A temporary security token generated by running the get-session-token command, used to configure multi-factor authentication.

If you use session tokens, you must set all parameters at the session level, even if some of them are set at the database level. Use ALTER SESSION to set session parameters.

S3BucketConfig

Contains S3 bucket configuration information as a JSON object with the following properties. Each parameter has an equivalent database-level parameter (shown in parentheses). If both the database-level parameter and its equivalent in S3BucketConfig are set, the S3BucketConfig parameter takes precedence.

Properties:

  • bucket: The name of the bucket
  • region: The name of the region (AWSRegion)
  • protocol: Specifies whether to secure the connection, one of the following:
    • http: Unencrypted connection
    • https: Encrypted connection
  • endpoint: The endpoint URL or IP address (AWSEndpoint)
  • enableVirtualAddressing: Boolean, specifies whether to rewrite the S3 URL to use a virtual hosted path (S3BucketCredentials)
  • requesterPays: Boolean, specifies whether requester (instead of bucket owner) pays the cost of accessing data on the bucket; must be set in order to access S3 buckets configured as Requester Pays buckets. By setting this property to true, you are accepting the charges for accessing data. If not specified, the default value is false.

The configuration properties for a given bucket may differ based on its type. For example, the following S3BucketConfig is for an AWS bucket AWSBucket and a Pure Storage bucket PureStorageBucket. AWSBucket doesn't specify an endpoint, so Vertica uses the AWSEndpoint, which defaults to s3.amazonaws.com:

ALTER DATABASE DEFAULT SET S3BucketConfig=
'[
    {
        "bucket": "AWSBucket",
        "region": "us-east-2",
        "protocol": "https",
        "requesterPays": true
    },
    {
        "bucket": "PureStorageBucket",
        "endpoint": "pure.mycorp.net:1234",
        "protocol": "http",
        "enableVirtualAddressing": false
    }
]';
S3BucketCredentials

Contains credentials for accessing an S3 bucket. Each parameter has an equivalent database-level parameter (shown in parentheses). If both the database-level parameter and its equivalent in S3BucketCredentials are set, the S3BucketCredentials parameter will take precedence.

Properties:

  • bucket: The name of the bucket
  • accessKey: The access key for the bucket (the ID in AWSAuth)
  • secretAccessKey: The secret access key for the bucket (the secret in AWSAuth)
  • sessionToken: The session token, only used when S3BucketCredentials is set at the session level (AWSSessionToken)

For example, the following S3BucketCredentials is for an AWS bucket AWSBucket and a Pure Storage bucket PureStorageBucket:

ALTER SESSION SET S3BucketCredentials='
[
    {
        "bucket": "AWSBucket",
        "accessKey": "<AK0>",
        "secretAccessKey": "<SAK0>",
        "sessionToken": "1234567890"
    },
    {
        "bucket": "PureStorageBucket",
        "accessKey": "<AK1>",
        "secretAccessKey": "<SAK1>"
    }
]';

This parameter is only visible to superusers. Users can set this parameter at the session level with ALTER SESSION.

S3EnableVirtualAddressing

Boolean, specifies whether to rewrite S3 URLs to use virtual-hosted paths (disabled by default). This configuration setting takes effect only when you have specified a value for AWSEndpoint.

If you set AWSEndpoint to a FIPS-compliant S3 endpoint, you must enable S3EnableVirtualAddressing.

The value of this parameter does not affect how you specify S3 paths.

As of September 30, 2020, AWS requires virtual address paths for newly created buckets.

S3RequesterPays

Boolean, specifies whether requester (instead of bucket owner) pays the cost of accessing data on the bucket. When true, the bucket owner is only responsible for paying the cost of storing the data, rather than all costs associated with the bucket; must be set in order to access S3 buckets configured as Requester Pays buckets. If not specified, the default value is false.

AWSStreamingConnectionPercentage

In Eon Mode, the number of connections to the communal storage to use for streaming reads. In a cloud environment, this setting helps prevent streaming data from using up all available file handles. This setting is unnecessary when using on-premises object stores because of their lower latency.

Examples

The following example sets a database-wide AWS region and credentials:

=> ALTER DATABASE DEFAULT SET AWSRegion='us-west-1';
=> ALTER DATABASE DEFAULT SET AWSAuth = 'myaccesskeyid123456:mysecretaccesskey123456789012345678901234';

The following example loads data from S3. You can use a glob if all files in the glob can be loaded together. In the following example, AWS_DataLake contains only ORC files.

=> COPY t FROM 's3://datalake/*' ORC;

You can specify a list of comma-separated S3 buckets as in the following example. All buckets must be in the same region. To load from more than one region, use separate COPY statements and change the value of AWSRegion between calls.

=> COPY t FROM 's3://AWS_Data_1/sales.parquet', 's3://AWS_Data_2/sales.parquet' PARQUET;

The following example creates a user storage location and a role, so that users without their own S3 credentials can read data from S3 using the server credential.

   --- set database-level credential (once):
=> ALTER DATABASE DEFAULT SET AWSAuth = 'myaccesskeyid123456:mysecretaccesskey123456789012345678901234';
				
=> CREATE LOCATION 's3://datalake' SHARED USAGE 'USER' LABEL 's3user';

=> CREATE ROLE ExtUsers;						
   --- Assign users to this role using GRANT (Role).
						
=> GRANT READ ON LOCATION 's3://datalake' TO ExtUsers;						

The following example sets an STS temporary session token. Vertica uses the session token to access S3 with the specified credentials and bypasses checking for a USER storage location.

$ aws sts get-session-token 
  { 
    "Credentials": { 
        "AccessKeyId": "ASIAJZQNDVS727EHDHOQ", 
        "SecretAccessKey": "F+xnpkHbst6UPorlLGj/ilJhO5J2n3Yo7Mp4vYvd", 
        "SessionToken": "FQoDYXdzEKv//////////wEaDMWKxakEkCyuDH0UjyKsAe6/3REgW5VbWtpuYyVvSnEK1jzGPHi/jPOPNT7Kd+ftSnD3qdaQ7j28SUW9YYbD50lcXikz/HPlusPuX9sAJJb7w5oiwdg+ZasIS/+ejFgCzLeNE3kDAzLxKKsunvwuo7EhTTyqmlLkLtIWu9zFykzrR+3Tl76X7EUMOaoL31HOYsVEL5d9I9KInF0gE12ZB1yN16MsQVxpSCavOFHQsj/05zbxOQ4o0erY1gU=", 
        "Expiration": "2018-07-18T05:56:33Z" 
    } 
  } 
					
$ vsql
=> ALTER SESSION SET AWSAuth = 'ASIAJZQNDVS727EHDHOQ:F+xnpkHbst6UPorlLGj/ilJhO5J2n3Yo7Mp4vYvd';				
=> ALTER SESSION SET AWSSessionToken = 'FQoDYXdzEKv//////////wEaDMWKxakEkCyuDH0UjyKsAe6/3REgW5VbWtpuYyVvSnEK1jzGPHi/jPOPNT7Kd+ftSnD3qdaQ7j28SUW9YYbD50lcXikz/HPlusPuX9sAJJb7w5oiwdg+ZasIS/+ejFgCzLeNE3kDAzLxKKsunvwuo7EhTTyqmlLkLtIWu9zFykzrR+3Tl76X7EUMOaoL31HOYsVEL5d9I9KInF0gE12ZB1yN16MsQVxpSCavOFHQsj/05zbxOQ4o0erY1gU=';	

See Also

Per-Bucket S3 Configurations