Exporting to S3 and GCS

Object-store file systems (S3 and GCS) have some differences from other file systems that affect data export. You must set some additional configuration parameters for authentication and region, and there are some restrictions on the output.

S3 Configuration Parameters

To access S3 you must create an IAM role and grant that role permission to access your S3 resources. For more information about IAM roles, see Amazon Web Services documentation.

Vertica uses several configuration parameters related to AWS, which can be set for the database or for a session. See S3 Parameters for the complete list. These parameters include:

  • AWSRegion: the region containing the output S3 bucket. This configuration parameter affects all S3 access from the database, including reads, so if you are using S3 in other ways, set this as a session parameter to avoid conflicts.
  • AWSAuth: the value is an ID (AccessKeyID) and a secret key (SecretAccessKey), formatted 'id:secret'. Use a session parameter to avoid storing credentials in the database.
  • AWSSessionToken, if you are using multi-factor authentication: the value is a token generated by AWS STS. You must use a session parameter; AWS session tokens expire.

GCS Configuration Parameters

Vertica uses the following configuration parameter related to GCS, which can be set for the database or for a session. See Google Cloud Storage Parameters for more information.

  • GCSAuth: the value is an ID (AccessKeyID) and a secret key (SecretAccessKey), formatted 'id:secret'. Use a session parameter to avoid storing credentials in the database.

Output Restrictions

Object-store file systems do not support renaming files in place; they implements a rename as a copy followed by a delete. On other file systems, EXPORT TO PARQUET supports atomicity by writing its output into a temporary directory and renaming it when complete. Such an approach is impractical for S3 and GCS, so EXPORT TO PARQUET writes directly to the destination path. It is therefore possible to begin reading the exported data before the export has finished, which could lead to errors. Be careful to wait for the export to finish before using the data.

On other file systems, Vertica retries a failed query if the database is K-safe. When exporting to S3 or GCS, Vertica does not retry the export and instead reports an error. For example, if a node goes down during export, Vertica returns Error 4142 (Node failure during execution).

If you cancel an export or an export fails, Vertica does not clean up the partial output. On S3 and GCS, that partial output costs you money, so after a cancellation or failure, delete the output files. Further, because the output path must not exist, you would need to delete the partial output before retrying the export with the same destination.

S3 limits buckets to 5TB. You might need to divide very large exports.