PARQUET (Parser)

Use the PARQUET clause with the COPY statement to load data in the Parquet format. When loading data into Vertica you can read all primitive types, UUIDs, and arrays of primitive types. When creating an external table you can additionally read structs and multi-dimensional arrays (arrays of arrays). See Loading Complex Types for further information on using arrays.

The PARQUET clause can be used alone or with optional parameters.

Vertica does not support Parquet files that were written using the DATA_PAGE_V2 page type.

Parameters

`hive_partition_cols`	Comma-separated list of columns that are partition columns in the data. See Using Partition Columns.
`flatten_complex_type_null`	Whether to flatten a null struct value to null values for all of its fields (true) or reject a row containing a null struct value (false, default). See Reading Structs.
`allow_no_match`	Whether to accept a path containing a glob with no matching files and report zero rows in query results. If this parameter is not set, Vertica returns an error if the path in the FROM clause does not match at least one file.
`skip_strong_schema_match`	This argument will be removed in a future release. See Using Flexible Complex Types instead. When defining a ROW in an external table, whether to allow the ROW in the Vertica table definition and the struct in the Parquet data to have different structures. By default, Vertica requires that a ROW contain all and only the fields and nesting structure found in the Parquet data definition (schema). Set this parameter to true to override this behavior. Regardless of the value of this parameter, the primitive types must match for all fields.
`allow_long_varbinary_match_complex_type`	Whether to enable flexible column types (see Using Flexible Complex Types). If true, the Parquet parser allows a complex type in the data to match a table column defined as LONG VARBINARY. If false, the Parquet parser requires strong typing of complex types. With the parameter set you can still use strong typing. Set this parameter to false if you want use of flexible columns to be treated as an error.

Loading Complex Types

Pqrquet data containing one-dimensional arrays (arrays of primitive types) can be loaded into Vertica-managed tables. The native array type is a different type than the external-table type; that is, the types have different OIDs. Syntactically they are the same, both in how you define them in tables and in how you query them.

The native array type does not support multi-dimensional arrays. If your data contains such arrays, you can read them directly using external tables. Alternatively, you can create the external table and then use CREATE TABLE AS SELECT to create a new Vertica-managed table, extracting each nested array as its own column.

Vertica supports only 3-level-encoded arrays, not 2-level-encoded.

Vertica does not support the required mode for complex types, only optional.

Data Types

The Parquet loader maps Parquet data types to Vertica data types as follows.

Parquet Logical Type	Vertica Data Type
StringLogicalType	VARCHAR
MapLogicalType	MAP
ListLogicalType	ARRAY/SET
IntLogicalType	INT/NUMERIC
DecimalLogicalType	NUMERIC
DateLogicalType	DATE
TimeLogicalType	TIME
TimestampLogicalType	TIMESTAMP
UUIDLogicalType	UUID

The following logical types are not supported:

EnumLogicalType
IntervalLogicalType
JSONLogicalType
BSONLogicalType
UnknownLogicalType

Parquet Physical Type	Vertica Data Type
BOOLEAN	BOOLEAN
INT32/INT64	INT
INT96	Not supported
FLOAT	DOUBLE
DOUBLE	DOUBLE
BYTE_ARRAY	VARBINARY
FIXED_LEN_BYTE_ARRAY	BINARY

Examples

Use the PARQUET clause without parameters if your data is not partitioned.

=> COPY t FROM 's3://AWS_DataLake/sales.parquet' PARQUET;

In the following example, the "id" and "name" columns are included in the data and the "created" and "region" columns are partition columns. Partition columns must be listed last when defining columns.

=> CREATE EXTERNAL TABLE t (id int, name varchar(50), created date, region varchar(50))
   AS COPY FROM 'hdfs:///path/*/*/*' 
   PARQUET(hive_partition_cols='created,region');

In the following example, the data contains structs and a null value should not reject the row.

=> CREATE EXTERNAL TABLE customers_expanded (...)
    AS COPY FROM '...' PARQUET(flatten_complex_type_nulls='True');

In the following example, the data directory contains no files.

=> CREATE EXTERNAL TABLE customers (...)
    AS COPY FROM 'hdfs:///data/*.parquet' PARQUET;
=> SELECT COUNT(*) FROM customers;
ERROR 7869: No files match when expanding glob: [hdfs:///data/*.parquet]

To read zero rows instead of producing an error, use the allow_no_match parameter with PARQUET:

=> CREATE EXTERNAL TABLE customers (...)
    AS COPY FROM 'hdfs:///data/*.parquet' PARQUET(allow_no_match='true');
=> SELECT COUNT(*) FROM customers;
 count 
------- 
     0 
(1 row)

To allow reading a complex type (menu, in this example) as a flexible column type, use the allow_long_varbinary_match_complex_type parameter:

=> CREATE EXTERNAL TABLE restaurants(name VARCHAR, cuisine VARCHAR, location_city ARRAY[VARCHAR], menu LONG VARBINARY) 
    AS COPY FROM '/data/rest*.parquet' 
    PARQUET(allow_long_varbinary_match_complex_type='True');

PARQUET (Parser)

Parameters

Loading Complex Types

Data Types

Examples

Version 10.0.x Documentation