Loading Data from an S3 Bucket

You can load data from an S3 bucket using COPY FROM. Before you can load data from S3, you must set certain configuration parameters.

When loading data from S3, the maximum key length is 1K (1024 bytes). The input url argument for the S3 source (which can contain multiple delimited URLs) can be no larger than 1MB.

Setting Parameters

To read from S3 you need to provide Vertica with certain information, such as the S3 region and a credential to use for authentication. There are two different ways of setting these parameters:

If you use ALTER SESSION SET to set AWS parameters, Vertica also sets the corresponding UDSource parameters as if you had also used ALTER SESSION SET UDPARAMETER. Therefore, a best practice when making session-level changes is to use ALTER SESSION SET with the AWS parameters instead of using ALTER SESSION SET UDPARAMETER with the AWS Library parameters, even if you are using the UDSource.

The following example shows how to set parameters using these methods.

--- set global default region
=> SELECT SET_CONFIG_PARAMETER('AWSRegion','us-west-1');
				
--- set credentials in the session only (also applies to UDSource)
=> ALTER SESSION SET AWSAuth='ID:secret';

--- alternate way to set credentials, for UDSource only (not copy from S3)
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_id='aws-id';
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_secret='aws-key';				

Vertica can read from buckets in only one AWS region at a time. To load from more than one region, change the region value for the session between loads.

Using COPY

Before you can read data from S3, you must create an IAM role for your EC2 instances to use, and grant that role permission to access your S3 resources. For more information about IAM roles, see the Amazon Web Services documentation.

You must set the AWSRegion configuration parameter to your S3 bucket's region. If you do not set the correct region, you might experience a delay before the load fails because Vertica retries several times before giving up.

The following example shows how to load data from buckets in different regions.

=> SELECT SET_CONFIG_PARAMETER('AWSRegion','us-west-1');
			
=> COPY sales FROM 's3://AWS_Data_1/sales.parquet' PARQUET;
				
=> ALTER SESSION SET AWSRegion='us-east-1');
				
=> COPY sales FROM 's3://AWS_Data_2/sales.orc', 's3://AWS_Data_3/sales.orc' ORC;

For more information, see Specifying COPY FROM Options.

Verifying the Load

You can verify that ROS files were created in the communal storage location using the AWS CLI:

$ aws s3 ls S3://testdb/ 
                           PRE 024/
                           PRE 028/
                           PRE 030/
                           PRE 034/
                           PRE 036/
.
.
.

You can also use the AWS web interface by navigating to the bucket and viewing its contents.

Granting Permissions for Non-priviledged Users

By default, S3 locations are only available to database superusers. To enable other users to load data from S3 create a USER storage location based on the S3 URL, and then grant read permissions to a user or group. The following example demonstrates creatiung a storage location for the S3 bucket named datalake and granting it to a role named ExtUsers.

=> CREATE LOCATION 's3://datalake' SHARED USAGE 'USER' LABEL 's3user';

=> CREATE ROLE ExtUsers;						
   --- Assign users to this role using GRANT (Role).
						
=> GRANT READ ON LOCATION 's3://datalake' TO ExtUsers;						

See Loading from an S3 Bucket for more information.