INFER_EXTERNAL_TABLE_DDL

Inspects a file in Parquet format and returns a CREATE EXTERNAL TABLE AS COPY statement that can be used to read the file. This statement might be incomplete. It could also contain more columns or columns with longer names than what Vertica supports; this function does not enforce Vertica system limits (see System Limits). Always inspect the output and address any issues before using it to create a table.

A Parquet file contains insufficient information to infer the type of partition columns, so this function shows these columns with a data type of "UNKNOWN" and emits a warning.

The function handles most data types found in Parquet data, including complex types. If a Parquet type is not supported in Vertica, the function emits a warning.

Syntax

INFER_EXTERNAL_TABLE_DDL( 'path', 'table_name' )

Arguments

path
      

Path to a file or directory using Parquet format. Any path that is valid for COPY is valid for this function. This function does not operate on files in other formats.

table_name Name of the external table to create. This name does not depend on file contents.

Privileges

Superuser, or non-superuser with READ privileges on the USER-accessible storage location (see GRANT (Storage Location)).

Examples

In the following example, the Parquet file contains data for a table with two INT columns. The table definition can be fully inferred, and you can use the returned SQL statement as-is.

=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/orders/*.parquet', 'orders');
 
				INFER_EXTERNAL_TABLE_DDL
--------------------------------------------------------------------------------------------------
create external table orders (id int, quantity int) as copy from '/data/orders/*.parquet' Parquet;
(1 row)

The following example uses partition columns. Types of partition column cannot be determined from the data.

=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/sales/*/*/*, 'sales');
WARNING 9262: This generated statement is incomplete because of one or more unknown column types. 
Fix these data types before creating the table
				INFER_EXTERNAL_TABLE_DDL                                                                                          
------------------------------------------------------------------------
 create external table "sales"("tx_id" int, "date" UNKNOWN, "region" UNKNOWN) as copy from '/data/sales/*/*/*' parquet(hive_partition_cols='date,region');
(1 row)		

For VARCHAR and VARBINARY columns, this function does not specify a length. The Vertica default length for these types is 80 bytes. If the Parquet data is longer, using this table definition unmodified could cause data to be truncated. Always review VARCHAR and VARBINARY columns to determine if you need to specify a length. This function emits a warning if the Parquet file contains columns of these types:

WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80