PARQUET

Use the PARQUET clause with the COPY statement to load data in the Parquet format. You can read all primitive types, UUIDs, structs, and arrays.

The PARQUET clause can be used alone or with optional parameters.

Parameters

hive_partition_cols Comma-separated list of columns that are partition columns in the data. See Using Partition Columns.

flatten_complex_type_null

Whether to flatten a null struct value to null values for all of its fields (true) or reject a row containing a null struct value (false, default). See Reading Structs.

allow_no_match

Whether to accept a path containing a glob with no matching files and report zero rows in query results. If this parameter is not set, Vertica returns an error if the path in the FROM clause does not match at least one file.
skip_strong_schema_match When defining a ROW in an external table, whether to allow the ROW in the Vertica table definition and the struct in the Parquet data to have different structures. By default, Vertica requires that a ROW contain all and only the fields and nesting structure found in the Parquet data definition (schema). Set this parameter to true to override this behavior. Regardless of the value of this parameter, the primitive types must match for all fields.

 

Examples

Use the PARQUET clause without parameters if your data is not partitioned.

=> COPY t FROM 's3://AWS_DataLake/sales.parquet' PARQUET;

In the following example, the "id" and "name" columns are included in the data and the "created" and "region" columns are partition columns. Partition columns must be listed last when defining columns.

=> CREATE EXTERNAL TABLE t (id int, name varchar(50), created date, region varchar(50))
   AS COPY FROM 'hdfs:///path/*/*/*' 
   PARQUET(hive_partition_cols='created,region');

In the following example, the data contains structs and a null value should not reject the row.

=> CREATE EXTERNAL TABLE customers_expanded (...)
    AS COPY FROM '...' PARQUET(flatten_complex_type_nulls='True');

In the following example, the data directory contains no files.

=> CREATE EXTERNAL TABLE customers (...)
    AS COPY FROM 'hdfs:///data/*.parquet' PARQUET;
=> SELECT COUNT(*) FROM customers;
ERROR 7869: No files match when expanding glob: [hdfs:///data/*.parquet]

To read zero rows instead of producing an error, use the allow_no_match parameter with PARQUET:

=> CREATE EXTERNAL TABLE customers (...)
    AS COPY FROM 'hdfs:///data/*.parquet' PARQUET(allow_no_match='true');
=> SELECT COUNT(*) FROM customers;
 count 
------- 
     0 
(1 row)