Browsing S3 Data Using External Tables

You might want to browse the data in a large data lake that is already stored on S3. Instead of loading data into Vertica, you can read it in place at query time using external tables.

You can create external tables to query Parquet, ORC, text, and delimited data stored in S3 buckets. External tables work the same way in Eon Mode as they do in Enterprise Mode, with one exception: users who are not superusers must have read access to an S3 storage location to run queries. You can create a USER storage location on S3 for this purpose.

S3 Authentication for External Tables

AWS supports two authentication modes for S3: IAM roles and legacy id and secret authentication. Which method you use for your external table depends on whether the IAM role you assigned to your Vertica instance has access to the S3 bucket you want to access:

  • If the IAM role has access (or you can grant it access) to the S3 bucket, then you do not have to take additional steps to configure authentication. Vertica uses the IAM role to authenticate with S3 by default. Whenever possible, use IAM roles, as it centralizes and simplifies authentication.
  • If the IAM role does not have access to the S3 bucket and you cannot grant it access, then you must use an id and secret for authentication. You set the credentials in the AWSAuth session parameter.

    The alternate credentials you use to access the S3 bucket must also have access read and write to the S3 storage locations that Vertica uses for communal and catalog storage. When you set the AWSAuth session parameter, Vertica uses the credentials it contains for all S3 access for your session.

Example

The following example illustrates the steps to create a USER storage location, give a user access to it, and create an external table (as that user).

  1. If the Vertica IAM role does not have access to the S3 bucket you want to browse, you must use alternate credentials. Define the AWS credentials associated with the S3 bucket where you will create the storage location:

    => ALTER SESSION SET AWSAuth='ID:secret';
  2. As a superuser, create a USER storage location on the same bucket that holds the data.

    => CREATE LOCATION 's3://datalake' SHARED USAGE 'USER' LABEL 's3user';

    The only storage locations you can create on S3 are USER locations. You must create it as a shared location.

  3. As a superuser, give access to the roles or users who will run queries against external tables.

       --- If you have not already created a role:
    => CREATE ROLE ExtUsers;						
       --- Assign users to this role using GRANT (Role).
    						
    => GRANT READ ON LOCATION 's3://datalake' TO ExtUsers;
  4. As a user who has that role, create the external table in the usual way.

    => CREATE EXTERNAL TABLE sales (itemID INT, date DATE, price FLOAT) 
    	AS COPY FROM 's3://datalake/sales/*.parquet' PARQUET;

See Creating External Tables for more information about creating and using external tables. If your data is in ORC or Parquet format, you can take advantage of partitioning to limit the data that Vertica reads during query execution. See Using Partition Columns.