Handling Null Struct Values

Struct fields, like columns of primitive types, can have null values. As with primitive types, a null struct field produces a null value for the corresponding column in the external table. In the following example, the last row in the data has an address struct with a null value for the street.

=> CREATE EXTERNAL TABLE customers (name varchar, street varchar, 
    city varchar, zipcode int, accountID int) 
    as copy from '...' parquet;
CREATE TABLE
				
=> SELECT * FROM customers ORDER BY accountID;
        name        |       street        |   city   | zipcode | accountID
--------------------+---------------------+----------+---------+-----------
 Missy Cooper       | 911 San Marcos St   | Austin   |   73344 |        17
 Sheldon Cooper     | 100 Main St Apt 4B  | Pasadena |   91001 |       139
 Leonard Hofstadter | 100 Main St Apt 4A  | Pasadena |   91001 |       142
 Leslie Winkle      | 23 Fifth Ave Apt 8C | Pasadena |   91001 |       198
 Raj Koothrappali   |                     | Pasadena |   91001 |       294
(5 rows)		

Null field values appear when a struct is present but is missing field values. If the entire struct is null in the source data, by default Vertica rejects the row. A null struct is different from a struct where all fields are null.

The data used in the previous example contains a sixth row with a null address struct. To expand null structs instead of rejecting the row, use the flatten_complex_type_nulls parameter to the PARQUET() or ORC() clause:

=> CREATE EXTERNAL TABLE customers (...)
    AS COPY FROM '...' PARQUET(flatten_complex_type_nulls='True');

With this parameter set, Vertica reports null values for all fields when the struct itself is null. It does not report a null value in the struct column. If you use this parameter, note that you cannot tell from the query results whether the struct column was null or a struct with no values was present in the data.

With this table definition, the query returns all six rows:

=> CREATE EXTERNAL TABLE customers (name varchar, street varchar, 
    city varchar, zipcode int, accountID int) 
    as copy from '...' parquet(flatten_complex_type_nulls='True');
CREATE TABLE
				
=> SELECT * FROM customers ORDER BY accountID;
        name        |       street        |   city   | zipcode | accountID
--------------------+---------------------+----------+---------+-----------
 Missy Cooper       | 911 San Marcos St   | Austin   |   73344 |        17
 Sheldon Cooper     | 100 Main St Apt 4B  | Pasadena |   91001 |       139
 Leonard Hofstadter | 100 Main St Apt 4A  | Pasadena |   91001 |       142
 Leslie Winkle      | 23 Fifth Ave Apt 8C | Pasadena |   91001 |       198
 Raj Koothrappali   |                     | Pasadena |   91001 |       294
 Stuart Bloom       |                     |          |         |       482
(6 rows)					

This behavior is the same whether you use expanded columns or inline structs:

=> CREATE EXTERNAL TABLE customers (name varchar, 
    address row(street varchar, city varchar, zipcode int), accountID int) 
    as copy from '...' parquet(flatten_complex_type_nulls='True');
CREATE TABLE
				
=> SELECT * FROM customers ORDER BY accountID;
        name        |                              address                               | accountID
--------------------+--------------------------------------------------------------------+-----------
 Missy Cooper       | {"street":"911 San Marcos St","city":"Austin","zipcode":73344}     |        17
 Sheldon Cooper     | {"street":"100 Main St Apt 4B","city":"Pasadena","zipcode":91001}  |       139
 Leonard Hofstadter | {"street":"100 Main St Apt 4A","city":"Pasadena","zipcode":91001}  |       142
 Leslie Winkle      | {"street":"23 Fifth Ave Apt 8C","city":"Pasadena","zipcode":91001} |       198
 Raj Koothrappali   | {"street":"","city":"Pasadena","zipcode":91001}                    |       294
 Stuart Bloom       | {"street":"","city":null,"zipcode":null}                           |       482
(6 rows)