Creating Tables Using Structs

Use the ROW expression to define a struct column. In the following example, the data has columns for customer name, address, and account number, and the address is a struct in the Parquet data. The types you declare in Vertica must be compatible with the types in the Parquet data.

=> CREATE EXTERNAL TABLE customers (
    name VARCHAR, 
    address ROW(street VARCHAR, city VARCHAR, zipcode INT), 
    accountID INT)
  AS COPY FROM '...' PARQUET;	

Within the ROW, you specify the fields and their data types using the same syntax as for columns. Vertica treats the ROW as a single column for purposes of queries.

Structs can contain other structs. In the following example, employees have various personal information, including an address which is itself a struct.

=> CREATE EXTERNAL TABLE employees(
    employeeID INT, 
    personal ROW(
      name VARCHAR, 
      address ROW(street VARCHAR, city VARCHAR, zipcode INT),
      taxID INT), 
    department VARCHAR)
  AS COPY FROM '...' PARQUET;

Structs can contain arrays of primitive types, arrays, or structs.

=> CREATE EXTERNAL TABLE customers(
  name VARCHAR,
  contact ROW(
    street VARCHAR,
    city VARCHAR,
    zipcode INT,
    email ARRAY[VARCHAR]
  ),
  accountid INT ) 
    AS COPY FROM '...' PARQUET;

For more information about combining structs and arrays, see Reading Combinations of Structs and Arrays.

If a struct contains fields with null values, those fields have null values in Vertica. If the struct value itself is null, Vertica treats it as a ROW with all fields being null. Note that you cannot tell from the query results whether the struct column was null or a struct with no values was present in the data. To instead reject the row, use the flatten_complex_type_nulls parameter to the PARQUET() clause:

=> CREATE EXTERNAL TABLE customers (...)
    AS COPY FROM '...' PARQUET(flatten_complex_type_nulls='False');

Vertica requires the definition of the external table to match the schema of the external data. For example, with the data used in the previous employees example, the following definition is an error:

=> CREATE EXTERNAL TABLE employees(
    employeeID INT, 
    personal ROW(
      name VARCHAR, 
      address ROW(street VARCHAR, city VARCHAR), 
      zipcode INT,
      taxID INT), 
    department VARCHAR)
  AS COPY FROM '...' PARQUET;
ERROR 9151: Datatype mismatch [...]

The data contains an address struct with three fields (street, city, zipcode), so the external table must also use a ROW with three fields. Changing the ROW to have two fields and promoting one of the fields to the parent ROW is a mismatch. Each ROW must match and, if structs are nested in the data, the complete structure must match.

To load data without strong schema matching, see Using Flexible Complex Types. The other Parquet support for loose schema matching does not apply to ROW data.

Restrictions

ROW columns have several restrictions:

  • The maximum nesting depth is 100.
  • Vertica tables support up to 9800 columns and fields. The ROW itself is not counted, only its fields.
  • ROW columns cannot use any constraints (such as NOT NULL) or defaults.
  • ROW fields cannot be auto_increment or setof.
  • A ROW definition must include at least one field.
  • "Row" is a reserved keyword within a ROW definition, but is permitted as the name of a table or column.
  • ROW columns cannot be modified using ALTER TABLE...ALTER COLUMN. You must drop and recreate the external table to change a ROW column.
  • External tables containing ROW columns cannot also contain identity, auto-increment, or sequence columns.