CREATE EXTERNAL TABLE AS COPY

CREATE EXTERNAL TABLE AS COPY creates a table definition for data external to your Vertica database. This statement is a combination of the CREATE TABLE and COPY statements, supporting a subset of each statement's parameters.

Canceling a CREATE EXTERNAL TABLE AS COPY statement can cause unpredictable results. If you need to make a change, allow the statement to complete, drop the table, and then retry.

You can use ALTER TABLE to change the data types of columns instead of dropping and recreating the table.

You can use CREATE EXTERNAL TABLE AS COPY with any types except types from the Place package.

Vertica does not create superprojections for external tables, since external tables are not stored in the database.

Syntax

CREATE EXTERNAL TABLE [ IF NOT EXISTS ] [[database.]schema.]table-name 
    ( column-definition[,…] ) 
[{INCLUDE | EXCLUDE} [SCHEMA] PRIVILEGES] 
AS COPY 
    [ ( { column-as-expression | column }
       [ DELIMITER [ AS ] 'char' ]
       [ ENCLOSED [ BY ] 'char' ]
       [ ENFORCELENGTH ]
       [ ESCAPE [ AS ] 'char' | NO ESCAPE ]
       [ FILLER datatype ]
       [ FORMAT 'format' ] 
       [ NULL [ AS ] 'string' ]
       [ TRIM 'byte' ]
    [,…] ) ]
    [ COLUMN OPTION ( column 
       [ DELIMITER [ AS ] 'char' ]
       [ ENCLOSED [ BY ] 'char' ]
       [ ENFORCELENGTH ]
       [ ESCAPE [ AS ] 'char' | NO ESCAPE ]
       [ FORMAT 'format' ]
       [ NULL [ AS ] 'string' ]
       [ TRIM 'byte' ]
    [,…] ) ]
FROM { 
      'path‑to‑data' [ ON { nodename | (nodeset) | ANY NODE } ] [ input-format ] }[,…] 
      | [ WITH ] SOURCE source( [arg=value[,…] ] ) 
}
      [ NATIVE 
      | FIXEDWIDTH COLSIZES {( integer )[,…]}
      | NATIVE VARCHAR
      | ORC
      | PARQUET
      ]
   [ ABORT ON ERROR ]
   [ DELIMITER [ AS ] 'char' ]
   [ ENCLOSED BY 'char' [ AND 'char' ] ]
   [ ENFORCELENGTH ]
   [ ERROR TOLERANCE ]
   [ ESCAPE AS 'char' | NO ESCAPE ]
   [ EXCEPTIONS 'path' [ ON nodename ] [,…] ]
   [ [ WITH ] FILTER filter( [ arg=value[,…] ] ) ] 
   [ NULL [ AS ] 'string' ]
   [ [ WITH ] PARSER parser([arg=value [,…] ]) ]
   [ RECORD TERMINATOR 'string' ]
   [ REJECTED DATA 'path' [ ON nodename ] [,…] ]
   [ REJECTMAX integer ]
   [ SKIP integer ]
   [ SKIP BYTES integer ]
   [ TRAILING NULLCOLS ]
   [ TRIM 'byte' ]

Parameters

For all supported parameters, see the CREATE TABLE and COPY statements. For information on using this statement with UDLs see Load (UDLs).

For additional guidance on using COPY parameters, see Specifying COPY FROM Options.

Privileges

Superuser, or non-superuser with the following privileges:

  • READ privileges on the USER-accessible storage location, see GRANT (Storage Location)
  • Full access (including SELECT) to an external table that the user has privileges to create

ORC and Parquet Data

When using the ORC and Parquet formats, Vertica supports some additional options in the COPY statement and data structures for columns. See Reading ORC and Parquet Formats.

If ORC or Parquet data is partitioned, Vertica expects Hive-style partitioning. If you see unexpected results when reading data, verify that globs in your file paths correctly align with the partition structure. See Troubleshooting Reads from ORC and Parquet Files.

Examples

The following example defines an external table for data stored in HDFS:

=> CREATE EXTERNAL TABLE sales (itemID INT, date DATE, price FLOAT) 
    AS COPY FROM 'hdfs:///dat/ext1.csv' DELIMITER ',';

The following example uses data in the Parquet format that is stored in S3:

=> CREATE EXTERNAL TABLE sales (itemID INT, date DATE, price FLOAT) 
    AS COPY FROM 's3://datalake/sales/*.parquet' PARQUET;

The following example creates an external table using partitioned data in the ORC format. The table includes four columns.  Two columns, "id" and "name", are in the data files. The other two, "created" and "region", are partition columns. For more about partition columns, see Using Partition Columns.

=> CREATE EXTERNAL TABLE t (id int, name varchar(50), created date, region varchar(50))
   AS COPY FROM 'hdfs:///path/*/*/*' 
   ORC(hive_partition_cols='created,region');

The following example creates an external table from data in Google Cloud Storage:

=> CREATE EXTERNAL TABLE sales (itemID INT, date DATE, price FLOAT) 
    AS COPY FROM 'gs://data/sales/*.csv';

The following example creates an external table for data containing structs, and specifies that null struct values should be treated as null values of the individual struct fields instead of rejected rows. See Using Structs.

=> CREATE EXTERNAL TABLE customers_expanded (name VARCHAR, street VARCHAR, 
    city VARCHAR, postalcode VARCHAR, account INT)
    AS COPY FROM '...' PARQUET(flatten_complex_type_nulls='True');

The following examples create external tables from data in the local file system:

=> CREATE EXTERNAL TABLE ext1 (x integer) AS COPY FROM '/tmp/ext1.dat' DELIMITER ',';
=> CREATE EXTERNAL TABLE ext2 (x integer) AS COPY FROM '/tmp/ext2.dat.bz2' BZIP DELIMITER ',';
=> CREATE EXTERNAL TABLE ext3 (x integer, y integer) AS COPY (x as '5', y) FROM '/tmp/ext3.dat.bz2' BZIP DELIMITER ',';				

To allow users without superuser access to use external tables with data on the local file system, S3, or GCS, create a location for 'user' usage and grant access to it. This example shows granting access to a user named Bob to any external table whose data is located under /tmp (including in subdirectories to any depth):

=> CREATE LOCATION '/tmp' ALL NODES USAGE 'user';  
=> GRANT ALL ON LOCATION '/tmp' to Bob;

The following example shows CREATE EXTERNAL TABLE using a user-defined source:

=> CREATE SOURCE curl AS LANGUAGE 'C++' NAME 'CurlSourceFactory' LIBRARY curllib;
=> CREATE EXTERNAL TABLE curl_table1 as COPY SOURCE CurlSourceFactory;

See Also

Creating External Tables in the Administrator's Guide