Handling Null Struct Values
Struct fields, like columns of primitive types, can have null values. As with primitive types, a null struct field produces a null value for the corresponding column in the external table. In the following example, the last row in the data has an address struct with a null value for the street.
=> CREATE EXTERNAL TABLE customers (name varchar, street varchar, city varchar, zipcode int, accountID int) as copy from '...' parquet; CREATE TABLE => SELECT * FROM customers ORDER BY accountID; name | street | city | zipcode | accountID --------------------+---------------------+----------+---------+----------- Missy Cooper | 911 San Marcos St | Austin | 73344 | 17 Sheldon Cooper | 100 Main St Apt 4B | Pasadena | 91001 | 139 Leonard Hofstadter | 100 Main St Apt 4A | Pasadena | 91001 | 142 Leslie Winkle | 23 Fifth Ave Apt 8C | Pasadena | 91001 | 198 Raj Koothrappali | | Pasadena | 91001 | 294 (5 rows)
Null field values appear when a struct is present but is missing field values. If the entire struct is null in the source data, by default Vertica rejects the row. A null struct is different from a struct where all fields are null.
The data used in the previous example contains a sixth row with a null address struct. To expand null structs instead of rejecting the row, use the flatten_complex_type_nulls
parameter to the PARQUET() or ORC() clause:
=> CREATE EXTERNAL TABLE customers (...) AS COPY FROM '...' PARQUET(flatten_complex_type_nulls='True');
With this parameter set, Vertica reports null values for all fields when the struct itself is null. It does not report a null value in the struct column. If you use this parameter, note that you cannot tell from the query results whether the struct column was null or a struct with no values was present in the data.
With this table definition, the query returns all six rows:
=> CREATE EXTERNAL TABLE customers (name varchar, street varchar, city varchar, zipcode int, accountID int) as copy from '...' parquet(flatten_complex_type_nulls='True'); CREATE TABLE => SELECT * FROM customers ORDER BY accountID; name | street | city | zipcode | accountID --------------------+---------------------+----------+---------+----------- Missy Cooper | 911 San Marcos St | Austin | 73344 | 17 Sheldon Cooper | 100 Main St Apt 4B | Pasadena | 91001 | 139 Leonard Hofstadter | 100 Main St Apt 4A | Pasadena | 91001 | 142 Leslie Winkle | 23 Fifth Ave Apt 8C | Pasadena | 91001 | 198 Raj Koothrappali | | Pasadena | 91001 | 294 Stuart Bloom | | | | 482 (6 rows)
This behavior is the same whether you use expanded columns or inline structs:
=> CREATE EXTERNAL TABLE customers (name varchar, address row(street varchar, city varchar, zipcode int), accountID int) as copy from '...' parquet(flatten_complex_type_nulls='True'); CREATE TABLE => SELECT * FROM customers ORDER BY accountID; name | address | accountID --------------------+--------------------------------------------------------------------+----------- Missy Cooper | {"street":"911 San Marcos St","city":"Austin","zipcode":73344} | 17 Sheldon Cooper | {"street":"100 Main St Apt 4B","city":"Pasadena","zipcode":91001} | 139 Leonard Hofstadter | {"street":"100 Main St Apt 4A","city":"Pasadena","zipcode":91001} | 142 Leslie Winkle | {"street":"23 Fifth Ave Apt 8C","city":"Pasadena","zipcode":91001} | 198 Raj Koothrappali | {"street":"","city":"Pasadena","zipcode":91001} | 294 Stuart Bloom | {"street":"","city":null,"zipcode":null} | 482 (6 rows)