INFER_EXTERNAL_TABLE_DDL

Inspects a file in Parquet format and returns a CREATE EXTERNAL TABLE AS COPY statement that can be used to read the file. This statement might be incomplete. It could also contain more columns or columns with longer names than what Vertica supports; this function does not enforce Vertica system limits (see System Limits). Always inspect the output and address any issues before using it to create a table.

A Parquet file contains insufficient information to infer the type of partition columns, so this function shows these columns with a data type of "UNKNOWN" and emits a warning.

The function handles most data types found in Parquet data, including complex types. If a Parquet type is not supported in Vertica, the function emits a warning.

By default, the function fully defines complex types, including handling nested types. You can instead treat the column as a single LONG VARBINARY value, which is useful if your data contains combinations of nesting that Vertica does not support.

Syntax

INFER_EXTERNAL_TABLE_DDL( 'path', 'table_name, ['vertica_type_for_complex_type'] )

Arguments

path
      

Path to a file or directory using Parquet format. Any path that is valid for COPY is valid for this function. This function does not operate on files in other formats.

table_name Name of the external table to create. This name does not depend on file contents.
vertica_type_for_complex_type Type to use to represent all columns of complex types, if you do not want to expand them fully. The only supported value is 'long varbinary'. For more information, see Using Flexible Complex Types.

Privileges

Superuser, or non-superuser with READ privileges on the USER-accessible storage location (see GRANT (Storage Location)).

Examples

In the following example, the Parquet file contains data for a table with two INT columns. The table definition can be fully inferred, and you can use the returned SQL statement as-is.

=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/orders/*.parquet', 'orders');
 
				INFER_EXTERNAL_TABLE_DDL
--------------------------------------------------------------------------------------------------
create external table orders (
  id int, 
  quantity int
) as copy from '/data/orders/*.parquet' parquet;
(1 row)

The following example shows output from complex types.

=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/restaurant.parquet', 'restaurants');
                                      INFER_EXTERNAL_TABLE_DDL                                                                                                  
--------------------------------------------------------------------------------------------------------------------------
 create external table "restaurants"(
  "cuisine" varchar,
  "location_city" Array[varchar],
  "menu" Array[
    ROW(
      "item" varchar,
      "price" varchar
    )
  ],
  "name" varchar
 ) as copy from '/data/restaurant.parquet' parquet;
(1 row)

In the previous example, the data contains an array of structs (rows) and the function generates a definition for it. However, arrays can contain only primitive types, so you cannot actually define this table as shown. You can, however, still define the table using flexible types (see Using Flexible Complex Types). Use the optional vertica_type_for_complex_type argument to treat complex types as flexible types:

=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/restaurant.parquet', 'restaurants', 'long varbinary');
                                      INFER_EXTERNAL_TABLE_DDL                                              
--------------------------------------------------------------------------------------------------------------------------
 create external table "restaurants"(
  "cuisine" varchar,
  "location_city" long varbinary,
  "menu" long varbinary,
  "name" varchar
 ) as copy from '/data/restaurant.parquet' parquet(allow_long_varbinary_match_complex_type='True');
(1 row)

The following example uses partition columns. Types of partition column cannot be determined from the data.

=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/sales/*/*/*, 'sales');
WARNING 9262: This generated statement is incomplete because of one or more unknown column types. 
Fix these data types before creating the table
				INFER_EXTERNAL_TABLE_DDL                                                                                          
------------------------------------------------------------------------
 create external table "sales"(
  "tx_id" int, 
  "date" UNKNOWN, 
  "region" UNKNOWN
) as copy from '/data/sales/*/*/*' parquet(hive_partition_cols='date,region');
(1 row)		

For VARCHAR and VARBINARY columns, this function does not specify a length. The Vertica default length for these types is 80 bytes. If the Parquet data is longer, using this table definition unmodified could cause data to be truncated. Always review VARCHAR and VARBINARY columns to determine if you need to specify a length. This function emits a warning if the Parquet file contains columns of these types:

WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80