Browsing S3 Data Using External Tables

You might want to browse the data in a large data lake that is already stored on S3. Instead of loading data into Vertica, you can read it in place at query time using external tables.

You can create external tables to query Parquet, ORC, text, and delimited data stored in S3 buckets. External tables work the same way in Eon Mode as they do in Enterprise Mode, with one exception: users who are not superusers must have read access to an S3 storage location to run queries. You can create a USER storage location on S3 for this purpose.

The following example illustrates the steps to create a USER storage location, give a user access to it, and create an external table (as that user).

  1. Define the AWS credentials associated with the S3 bucket where you will create the storage location:

    => ALTER SESSION SET AWSAuth='ID:secret';
  2. As a superuser, create a USER storage location on the same bucket that holds the data.

    => CREATE LOCATION 's3://datalake' SHARED USAGE 'USER' LABEL 's3user';

    The only storage locations you can create on S3 are USER locations. You must create it as a shared location.

  3. As a superuser, give access to the roles or users who will run queries against external tables.

       --- If you have not already created a role:
    => CREATE ROLE ExtUsers;						
       --- Assign users to this role using GRANT (Role).
    						
    => GRANT READ ON LOCATION 's3://datalake' TO ExtUsers;
  4. As a user who has that role, create the external table in the usual way.

    => CREATE EXTERNAL TABLE sales (itemID INT, date DATE, price FLOAT) 
    	AS COPY FROM 's3://datalake/sales/*.parquet' PARQUET;

See Creating External Tables for more information about creating and using external tables. If your data is in ORC or Parquet format, you can take advantage of partitioning to limit the data that Vertica reads during query execution. See Using Partition Columns.