Reading Combinations of Structs and Arrays

Columns in the Parquet format can contain structs, represented in Vertica by the ROW type, and arrays, represented by the ARRAY type. For external tables and the Parquet format only, you can define a column for heterogeneous combinations of these two types: a struct containing array fields or an array of structs. These types can be nested up to the maximum nesting depth of 100.

Querying Columns

The following table definition includes a struct that contains an array:

=> CREATE EXTERNAL TABLE customers(
  name VARCHAR,
  contact ROW(
    street VARCHAR,
    city VARCHAR,
    zipcode INT,
    email ARRAY[VARCHAR]
  ),
  accountid INT ) 
    AS COPY FROM '...' PARQUET;

Output is formatted in JSON for these types:

=> SELECT name, contact FROM customers;
        name        |                                                        contact                                                        
--------------------+-----------------------------------------------------------------------------------------------------------------------
 Missy Cooper       | {"street":"911 San Marcos St","city":"Austin","zipcode":73344,"email":["missy@mit.edu","mcooper@cern.gov"]}
 Sheldon Cooper     | {"street":"100 Main St Apt 4B","city":"Pasadena","zipcode":91001,"email":["shelly@meemaw.name","cooper@caltech.edu"]}
 Leonard Hofstadter | {"street":"100 Main St Apt 4A","city":"Pasadena","zipcode":91001,"email":["hofstadter@caltech.edu"]}
 Leslie Winkle      | {"street":"23 Fifth Ave Apt 8C","city":"Pasadena","zipcode":91001,"email":[]}
 Raj Koothrappali   | {"street":null,"city":"Pasadena","zipcode":91001,"email":["raj@available.com"]}
 Stuart Bloom       | 
(6 rows)

You can address individual fields in a query:

=> SELECT name, contact.email FROM customers;
        name        |                    email                    
--------------------+---------------------------------------------
 Missy Cooper       | ["missy@mit.edu","mcooper@cern.gov"]
 Sheldon Cooper     | ["shelly@meemaw.name","cooper@caltech.edu"]
 Leonard Hofstadter | ["hofstadter@caltech.edu"]
 Leslie Winkle      | []
 Raj Koothrappali   | ["raj@available.com"]
 Stuart Bloom       | 
(6 rows)

The following table definition includes an array that contains a struct with a nested struct:

=> CREATE EXTERNAL TABLE orders(
  orderid INT,
  accountid INT,
  shipments ARRAY[
    ROW(
      shipid INT,
      address ROW(
        street VARCHAR,
        city VARCHAR,
        zip INT
        ),
      shipdate DATE
    )
  ]
 ) AS COPY FROM '...' PARQUET;

Some orders consist of more than one shipment. (Line breaks have been inserted into the following output for legibility.)

=> SELECT * FROM orders;
 orderid | accountid |                          shipments                                                                                                          
---------+-----------+---------------------------------------------------------------------------------------------------------------
   99123 |        17 | [{"shipid":1,"address":{"street":"911 San Marcos St","city":"Austin","zip":73344},"shipdate":"2020-11-05"},
			{"shipid":2,"address":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipdate":"2020-11-06"}]
   99149 |       139 | [{"shipid":3,"address":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipdate":"2020-11-06"}]
   99162 |       139 | [{"shipid":4,"address":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipdate":"2020-11-04"},
			{"shipid":5,"address":{"street":"100 Main St Apt 4A","city":"Pasadena","zip":91001},"shipdate":"2020-11-11"}]
(3 rows)

You can use array indexing and struct field selection together in queries:

=> SELECT orderid, shipments[0].shipdate AS ship1, shipments[1].shipdate AS ship2 FROM orders;
 orderid |   ship1    |   ship2    
---------+------------+------------
   99123 | 2020-11-05 | 2020-11-06
   99149 | 2020-11-06 | 
   99162 | 2020-11-04 | 2020-11-11
(3 rows)

This example selects specific array indices. To access all entries, use EXPLODE.

You cannot use mixed columns in CREATE TABLE AS SELECT (CTAS) or in views. This restriction applies for the entire column or for field selection within it.

Using Complex Columns in Other Parts of the Query

You cannot compare and sort mixed complex types, like the shipments column in the previous example. You can, however, compare and sort on primitive types, one-dimensional arrays, or rows contained therein. For example, you cannot order by shipments, an array of structs, but you can order by an address struct:

=> SELECT orderid, shipments[0] FROM orders ORDER BY shipments[0].address;
 orderid |                                                  shipments
---------+--------------------------------------------------------------------------------------------------------------
   99162 | {"shipid":4,"address":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipdate":"2020-11-04"}
   99149 | {"shipid":3,"address":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipdate":"2020-11-06"}
   99123 | {"shipid":1,"address":{"street":"911 San Marcos St","city":"Austin","zip":73344},"shipdate":"2020-11-05"}
(3 rows)	

You can also use non-mixed values in a WHERE clause:

=> SELECT name, contact FROM customers 
    WHERE contact.email = ARRAY['missy@mit.edu', 'mcooper@cern.gov'];
     name     |                                                   contact
--------------+-------------------------------------------------------------------------------------------------------------
 Missy Cooper | {"street":"911 San Marcos St","city":"Austin","zipcode":73344,"email":["missy@mit.edu","mcooper@cern.gov"]}
(1 row)