Exporting to S3

The S3 file system has some differences from other file systems that affect data export. You must set some additional configuration parameters for authentication and region, and there are some restrictions on the output.

Configuration Parameters

To access S3 you must create an IAM role and grant that role permission to access your S3 resources. For more information about IAM roles, see the Amazon Web Services documentation.

Vertica uses several configuration parameters related to AWS, which can be set for the database or for a session. See AWS Parameters for the complete list. These parameters include:

  • AWSRegion: the region containing the output S3 bucket. This configuration parameter affects all S3 access from the database, including reads, so if you are using S3 in other ways, set this as a session parameter to avoid conflicts.
  • AWSAuth: the value is an ID (AccessKeyID) and a secret key (SecretAccessKey), formatted 'id:secret'. Use a session parameter to avoid storing credentials in the database.
  • AWSSessionToken, if you are using multi-factor authentication: the value is a token generated by AWS STS. You must use a session parameter; AWS session tokens expire.

Output Restrictions

S3 limits buckets to 5TB. You might need to divide very large exports.

The S3 file system does not support renaming files in place; it implements a rename as a copy followed by a delete. On other file systems, EXPORT TO PARQUET supports atomicity by writing its output into a temporary directory and renaming it when complete. Such an approach is impractical for S3, so EXPORT TO PARQUET writes directly to the destination path. It is therefore possible to begin reading the exported data before the export has finished, which could lead to errors. Be careful to wait for the export to finish before using the data.

On other file systems, Vertica retries a failed query if the database is K-safe. When exporting to S3, Vertica does not retry the export and instead reports an error. For example, if a node goes down during export, Vertica returns Error 4142 (Node failure during execution).

If you cancel an export or an export fails, Vertica does not clean up the partial output. On S3, that partial output costs you money, so after a cancellation or failure, delete the files on S3. Further, because the output path must not exist, you would need to delete the partial output before retrying the export with the same destination.

Be careful to avoid concurrent exports to the same output destination. Doing so is an error on any file system, and on other file systems the result is that only one export can succeed. Doing so on S3 could produce incorrect results because of the way S3 propagates changes.