ARRAY

Represents array data. There are two types of arrays in Vertica:

  • Native array: a one-dimensional array of a primitive type. Native arrays are tracked in the TYPES system table.
  • Inlined array: an array that contains a nested array, sometimes called a multi-dimensional array. Inlined complex types, including inlined arrays, are tracked in the COMPLEX_TYPES system table.

Vertica-managed tables support native arrays but not inlined arrays.  In other words, arrays stored in Vertica must be one-dimensional arrays of primitive types. External tables backed by Parquet data support both types of arrays.

Both types of arrays operate in the same way, but they have different OIDs.

Arrays do not support LONG types (like LONG VARBINARY or LONG VARCHAR) or user-defined types (like Geometry).

Syntax for Direct Construction

Use the ARRAY keyword to construct an array type. For example, to create an array of integer values, you would do the following:

=> SELECT ARRAY[1,2,3];
 array
-------
 [1,2,3]
(1 row)

You can nest an array inside another array to create an inlined (two-dimensional) array, as in the following example.

=> SELECT ARRAY[ARRAY[1],ARRAY[2]];
   array
-----------
 [[1],[2]]
(1 row)

You cannot use a nested array as one of the values when creating an array, as in the following example.

--- error:
=> SELECT ARRAY[ARRAY[1.0], ARRAY[ARRAY[2.0]]];

Array Input Format for Column Definition

Use "ARRAY[data_type]" to declare an array column in a table. The following example defines an external table for customers (abbreviated as cust):

=> CREATE EXTERNAL TABLE cust 
(
cust_custkey int, 
cust_custname varchar(50), 
cust_custstaddress ARRAY[varchar(100)], 
cust_custaddressln2 ARRAY[varchar(100)],
cust_custcity ARRAY[varchar(50)], 
cust_custstate ARRAY[char(2)],
cust_custzip ARRAY[int],
cust_email varchar(50), 
cust_phone varchar(30)
) 
AS COPY FROM '...' PARQUET;

The following example defines a Vertica-managed table:

=> CREATE TABLE customers (cust_id INT, cust_name VARCHAR, cust_email ARRAY[VARCHAR]);

To declare a multi-dimensional array, use nesting. For example, ARRAY[ARRAY[int]] specifies a two-dimensional array. Nested arrays are supported for external tables only, not for Vertica-managed tables.

Array Output Format

Queries of array columns return JSON format, with the values shown in comma-separated lists in brackets. The following example shows a query that includes array columns.

=> SELECT cust_custkey,cust_custstaddress,cust_custcity,cust_custstate from cust;
cust_custkey |               cust_custstaddress                      |                cust_custcity                | cust_custstate
-------------+-------  ----------------------------------------------+---------------------------------------------+----------------
      342176 | ["668 SW New Lane","518 Main Ave","7040 Campfire Dr"] | ["Winchester","New Hyde Park","Massapequa"] | ["VA","NY","NY"]
      342799 | ["2400 Hearst Avenue","3 Cypress Street"]             | ["Berkeley","San Antonio"]                  | ["CA","TX"]
      342845 | ["336 Boylston Street","180 Clarkhill Rd"]            | ["Boston","Amherst"]                        | ["MA","MA"]
      342321 | ["95 Fawn Drive"]                                     | ["Allen Park"]                              | ["MI"]
      342989 | ["5 Thompson St"]                                     | ["Massillon"]                               | ["OH"]
(5 rows)

Note that JSON format escapes some characters that would not be escaped in native VARCHARs. For example, if you insert "c:\users\data" into an array, the JSON output for that value is "c:\\users\\data".

Element Access

You can access (dereference) elements from an array by index (zero-based):

=> SELECT (ARRAY['a','b','c','d','e'])[1];
array
-------
    "b"
(1 row)

To specify a range, use the format start:end. The end of the range is non-inclusive.

=> SELECT(ARRAY['a','b','c','d','e','f','g'])[1:4];
array
---------
["b","c","d"]
(1 row)

To dereference an element from a multi-dimensional array, put each index in brackets:

=> SELECT(ARRAY[ARRAY[1,2],ARRAY[3,4]])[0][0];
 array
-------
 1
(1 row)		

Out-of-bound index references return NULL.

Restrictions

  • Arrays support only data of primitive types, for example, int, UUID, and so on.
  • Nested arrays are supported only for external tables using Parquet data.
  • Selected parsers support using COPY to load one-dimensional arrays into ROS. See the documentation of individual parsers for more information.
  • Arrays are 0-indexed. The first element's ordinal position is 0, second is 1, and so on.
  • Array dimensionality is enforced. A column cannot contain arrays of varying dimensions. For example, a column that contains a three-dimensional array can only contain other three-dimensional arrays; it cannot simultaneously include a one-dimensional array. However, the arrays in a column can vary in size, where one array can contain four elements while another contains ten.
  • Out-of-bound indexes into arrays return NULL.

Null Handling

Null semantics for collections are consistent with normal columns. See NULL Sort Order for more information on null-handling.

Casting

Casting an array casts each element of the array. You can therefore cast between data types following the same rules as for casts of scalar values.

You can cast both literal arrays and array columns explicitly:

=> SELECT ARRAY['1','2','3']::ARRAY[INT];
  array
---------
[1,2,3]
(1 row)

=> CREATE TABLE transactions (tid INT, prod_ids ARRAY[VARCHAR], quantities ARRAY[VARCHAR(32)]);

=> INSERT INTO transactions VALUES (12345, ARRAY['p1265', 'p4515'], ARRAY['15','2']);
 
=> SELECT quantities :: ARRAY[INT] FROM transactions;
 quantities
------------
   [15,2]
(1 row)

Assignment casts and implicit casts work the same way as for scalars:

=> CREATE TABLE txreport (prod_ids ARRAY[VARCHAR], quants ARRAY[INT]);

--- transactions.quantities is an array of varchar, cast here to int
=> INSERT INTO txreport SELECT prod_ids, quants FROM transactions;
				
=> SELECT APPLY_SUM(quantities) FROM transactions;
				

You can perform explicit casts, but not implicit casts, between the ARRAY and SET types.

You cannot cast from an array to an array with a different dimensionality, for example from a two-dimensional array to a one-dimensional array.

Casting can increase the storage needed for a column. For example, if you cast an array of INT to an array of VARCHAR(50), each element takes more space and thus the array takes more space. If the difference is extreme or the array has many elements, this could mean that the array no longer fits within the space allotted for the column. In this case the operation reports an error and fails.

Functions and Operators

See Collection Functions for a comprehensive list of functions that can be used to manipulate arrays and sets.

Collections support equality (=), inequality (<>), and comparison operators (<, <=, >, >=) between collections of the same type (arrays or sets). Comparisons follow these rules:

  • A null collection is ordered last.
  • Non-null collections are compared element by element, using the ordering rules of the element's data type. The relative order of the first pair of non-equal elements determines the order of the two collections.
  • If all elements in both collections are equal up to the length of the shorter collection, the shorter collection is ordered before the longer one.
  • If all elements in both collections are equal and the collections are of equal length, the collections are equal.

Collections can be used in the following ways:

Collections cannot be used as partition columns when creating tables. Collections cannot be used with analyze_statistics() or TopK projections.