Export Data to Amazon S3 From Vertica

After you configure the library for Amazon Web Services (AWS), you can export Vertica data to Amazon S3 by calling the S3EXPORT() transform function. S3EXPORT() writes data to files, based on the URL you provide. Vertica performs all communication over HTTPS, regardless of the URL type you use. Vertica does not support virtual host style URLs. If you use HTTPS URL constructions, you must use path style URLs.

If your S3 bucket contains a period in its path, set the prepend_hash parameter to True.

You can control the output of S3EXPORT() in the following ways:

Adjust the Query Provided to S3EXPORT

By adjusting the query given to S3EXPORT(), you can export anything from tables to reporting queries.

This example exports a whole table:

=> SELECT S3EXPORT( * USING PARAMETERS url='s3://exampleBucket/object') OVER(PARTITION BEST) 
   FROM exampleTable;
 rows | url
------+------------------------------
  606 | https://exampleBucket/object
(1 row)

This example exports the results of a query:

=> SELECT S3EXPORT(customer_name, annual_income USING PARAMETERS url='s3://exampleBucket/object') OVER()
    FROM public.customer_dimension
      WHERE (customer_gender, annual_income) IN
        (SELECT customer_gender, MAX(annual_income) 
         FROM public.customer_dimension
         GROUP BY customer_gender);

 rows | url
------+------------------------------
   25 | https://exampleBucket/object
(1 row)

Adjust the Partition of Your Result Set with the OVER Clause

Use the OVER clause to control your export partitions. Using the OVER() clause without qualification results in a single partition processed by the initiator for all of the query data. This example shows how to call the function with an unqualified OVER() clause:

=> SELECT S3EXPORT(name, company USING PARAMETERS url='s3://exampleBucket/object',
                                                  delimiter=',') OVER()
     FROM exampleTable WHERE company='Vertica';
 rows | url
------+------------------------------
   10 | https://exampleBucket/object
(1 row)

You can also use window clauses, such as window partition clauses and window order clauses, to manage exported objects.

This example shows how you can use a window partition clause to partition S3 objects based on company values:

=> SELECT S3EXPORT(name, company
                    USING PARAMETERS url='s3://exampleBucket/object',
                                     delimiter=',') OVER(PARTITION BY company) AS MEDIAN
      FROM exampleTable;

Adjusting the Export Chunk Size for Wide Tables

You may encounter the following error when exporting extremely wide tables or tables with long data types such as LONG VARCHAR or LONG VARBINARY:

=> SELECT S3EXPORT( * USING PARAMETERS url='s3://exampleBucket/object') OVER(PARTITION BEST) 
   FROM veryWideTable; 
ERROR 5861: Error calling setup() in User Function s3export 
at [/data/.../S3.cpp:787],
error code: 0, message: The specified buffer of 10485760 bytesRead is too small, 
it should be at least 11279701 bytesRead.

Vertica returns this error if the data for a single row overflows the buffer storing the data before export. By default, this buffer is 10MB. You can increase the size of this buffer using the chunksize parameter, which sets the size of the buffer in bytes. This example sets it to around 60MB:

=> SELECT S3EXPORT( * USING PARAMETERS url='s3://exampleBucket/object', chunksize=60485760) 
   OVER(PARTITION BEST) FROM veryWideTable;
 rows | url
------+------------------------------
  606 | https://exampleBucket/object
(1 row)

See Also