Using External Tables

External tables let you query data stored in files accessible to the Vertica database, but not managed by it. Creating external tables supplies read-only access through SELECT queries. You cannot modify external tables through DML commands, such as INSERT, UPDATE, DELETE, and MERGE.

Using CREATE EXTERNAL TABLE AS COPY Statement

You create external tables with the CREATE EXTERNAL TABLE AS COPY... statement, shown in this basic example:

CREATE EXTERNAL TABLE tbl(i INT) AS COPY (i) FROM 'path1' ON v_vmart_node0001, 'path2' ON v_vmart_node0002;

For more details on the supported options to create an external table, see the CREATE EXTERNAL TABLE statement in the SQL Reference Manual.

The data you specify in the FROM clause of a CREATE EXTERNAL TABLE AS COPY statement can reside in one or more files or directories, and on one or more nodes. After successfully creating an external table, Vertica stores the table name and its COPY definition. Each time a select query references the external table, Vertica parses the COPY statement definition again to access the data. Here is a sample select statement:

SELECT * FROM tbl WHERE i > 10;

Storing Vertica Data in External Tables

While there are many requirements for you to use external table data, one reason is to store infrequently-accessed Vertica data on low-cost external media. If external storage is a goal at your site, the process to accomplish that requires exporting the older data to a text file, creating a bzip or gzip file of the export data, and saving the compressed file on an NFS disk. You can then create an external table to access the data any time it is required.

Calculating Exact Row Count for External Tables

To calculate the exact number of rows in an external table, use ANALYZE_EXTERNAL_ROW_COUNT. The Optimizer uses this count to optimize for queries that access external tables.

In particular, if an external table participates in a join, the Optimizer can now identify the smaller table to be used as the inner input to the join, resulting in better query performance.

Using External Tables with User-Defined Load (UDL) Functions

You can also use external tables in conjunction with the UDL functions that you create. For more information about using UDLs, see User Defined Load (UDL) in Extending Vertica.

Organizing External Table Data

If the data you store in external tables changes regularly (for instance, each month in the case of storing recent historical data), your COPY definition statement can use wildcards to make parsing the stored COPY statement definition more dynamic. For instance, if you store monthly data on an NFS mount, you could organize monthly files within a top-level directory for a calendar year, such as:

/2012/monthly_archived_data/

In this case, the external table COPY statement will include a wildcard definition such as the following:

CREATE TABLE archive_data (...) AS COPY FROM '/nfs_name/2012/monthly_archived_data/*'

Whenever a Vertica query references the external table months, and Vertica parses the COPY statement, all stored data tables in the top-level monthly_archived_data directory are made accessible to the query.