ARRAY

Represents array data. There are two types of arrays in Vertica:

  • Native array: a one-dimensional array of a primitive type. Native arrays are tracked in the TYPES system table and used in native tables.
  • Non-native array: all other supported arrays, including arrays that contain other arrays (multi-dimensional arrays) or structs (ROWs). Non-native arrays have some usage restrictions. Non-native arrays are tracked in the COMPLEX_TYPES system table.

Both types of arrays operate in the same way, but they have different OIDs.

Arrays do not support LONG types (like LONG VARBINARY or LONG VARCHAR) or user-defined types (like Geometry).

Arrays can be bounded, meaning they specify a maximum element count, or unbounded. Unbounded arrays have a maximum binary size, which can be set explicitly or defaulted. See Limits on Element Count and Collection Size.

Restrictions

  • Native arrays support only data of primitive types, for example, int, UUID, and so on.
  • Selected parsers support using COPY to load arrays. See the documentation of individual parsers for more information.
  • Arrays are 0-indexed. The first element's ordinal position is 0, second is 1, and so on.
  • Array dimensionality is enforced. A column cannot contain arrays of varying dimensions. For example, a column that contains a three-dimensional array can only contain other three-dimensional arrays; it cannot simultaneously include a one-dimensional array. However, the arrays in a column can vary in size, where one array can contain four elements while another contains ten.
  • When referencing elements from multi-dimensional non-native arrays, you must cast to the correct data type before performing operations on the result.
  • Out-of-bound indexes into arrays return NULL.
  • Array bounds, if specified, are enforced for all operations that load or alter data. Unbounded arrays may have as many elements as will fit in the allotted binary size.
  • An array has a maximum binary size. If this size is not set when the array is defined, a default value is used.

Syntax for Direct Construction (Literals)

Use the ARRAY keyword to construct an array value. The following example creates an array of integer values.

=> SELECT ARRAY[1,2,3];
 array
-------
 [1,2,3]
(1 row)

You can nest an array inside another array, as in the following example.

=> SELECT ARRAY[ARRAY[1],ARRAY[2]];
   array
-----------
 [[1],[2]]
(1 row)

Array literals cannot contain any other complex types.

Because the elements are known at the time you directly construct an array, these arrays are implicitly bounded.

Syntax for Column Definition

Arrays used in column definitions can be either bounded or unbounded. Bounded arrays must specify a maximum number of elements. Unbounded arrays can specify a maximum binary size (in bytes) for the array, or the value of DefaultArrayBinarySize is used. You can specify a bound or a binary size but not both. For more information about these values, see Limits on Element Count and Collection Size.

Type Syntax Semantics
Bounded array
ARRAY[data_type, max_elements]

Example:

ARRAY[VARCHAR(50),100]

Can contain no more than max_elements elements. Attempting to add more is an error.

Has a binary size of the size of the data type multiplied by the maximum number of elements (possibly rounded up).

Unbounded array with maximum binary size
ARRAY[data_type](max_size)

Example:

ARRAY[VARCHAR(50)](32000)

Can contain as many elements as fit in max_size. Ignores the value of DefaultArrayBinarySize.

Unbounded array with default binary size
ARRAY[data_type]

Example:

ARRAY[VARCHAR(50)]

Can contain as many elements as fit in the default binary size.

Equivalent to:

ARRAY[data_type](DefaultArrayBinarySize)

The following example defines a table for customers using an unbounded array:

=> CREATE TABLE customers (id INT, name VARCHAR, email ARRAY[VARCHAR(50)]);

The following example uses a bounded array for customer email addresses and an unbounded array for order history:

=> CREATE TABLE customers (id INT, name VARCHAR, email ARRAY[VARCHAR(50),5], orders ARRAY[INT]);

The following example uses an array that has ROW elements:

=> CREATE TABLE orders(
  orderid INT,
  accountid INT,
  shipments ARRAY[
    ROW(
      shipid INT,
      address ROW(
        street VARCHAR,
        city VARCHAR,
        zip INT
        ),
      shipdate DATE
    )
  ]
 );

To declare a multi-dimensional array, use nesting. For example, ARRAY[ARRAY[int]] specifies a two-dimensional array.

Array Output Format

Queries of array columns return JSON format, with the values shown in comma-separated lists in brackets. The following example shows a query that includes array columns.

=> SELECT cust_custkey,cust_custstaddress,cust_custcity,cust_custstate from cust;
cust_custkey |               cust_custstaddress                      |                cust_custcity                | cust_custstate
-------------+-------  ----------------------------------------------+---------------------------------------------+----------------
      342176 | ["668 SW New Lane","518 Main Ave","7040 Campfire Dr"] | ["Winchester","New Hyde Park","Massapequa"] | ["VA","NY","NY"]
      342799 | ["2400 Hearst Avenue","3 Cypress Street"]             | ["Berkeley","San Antonio"]                  | ["CA","TX"]
      342845 | ["336 Boylston Street","180 Clarkhill Rd"]            | ["Boston","Amherst"]                        | ["MA","MA"]
      342321 | ["95 Fawn Drive"]                                     | ["Allen Park"]                              | ["MI"]
      342989 | ["5 Thompson St"]                                     | ["Massillon"]                               | ["OH"]
(5 rows)

Note that JSON format escapes some characters that would not be escaped in native VARCHARs. For example, if you insert "c:\users\data" into an array, the JSON output for that value is "c:\\users\\data".

Element Access

You can access (dereference) elements from an array by index (zero-based):

=> SELECT (ARRAY['a','b','c','d','e'])[1];
array
-------
b
(1 row)

To specify a range, use the format start:end. The end of the range is non-inclusive.

=> SELECT(ARRAY['a','b','c','d','e','f','g'])[1:4];
array
---------
["b","c","d"]
(1 row)

To dereference an element from a multi-dimensional array, put each index in brackets:

=> SELECT(ARRAY[ARRAY[1,2],ARRAY[3,4]])[0][0];
 array
-------
 1
(1 row)		

Out-of-bound index references return NULL.

To perform operations on elements from multi-dimensional arrays, first cast the array to the correct type:

=> SELECT ((ARRAY[ARRAY[0,1]])[0]::ARRAY[INT])[0] + 
	((ARRAY[ARRAY[2,3]])[0]::ARRAY[INT])[0];
 ?column?
----------
        2
(1 row)

Limits on Element Count and Collection Size

When declaring a collection type for a table column, you can limit either the number of elements or the total binary size of the collection. During query processing, Vertica always reserves the maximum memory needed for the column, based on either the element count or the binary size. If this size is much larger than your data actually requires, setting one of these limits can improve query performance by reducing the amount of memory that must be reserved for the column.

A bounded collection specifies a maximum element count. A value in a bounded collection column may contain fewer elements, but it may not contain more. Any attempt to insert more elements into a bounded collection than the declared maximum is an error. A bounded collection has a binary size that is the product of the data-type size and the maximum number of elements, possibly rounded up.

An unbounded collection specifies a binary size in bytes, explicitly or implicitly. It may contain as many elements as can fit in that binary size.

If a nested array or set specifies bounds for all dimensions, Vertica sets a single bound that is the product of the bounds. In the following example, the inner and outer arrays each have a bound of 10, but only a total element count of 100 is enforced.

ARRAY[ARRAY[INT,10],10]

If a nested collection specifies a bound for only the outer collection, it is treated as the total bound. The previous example is equivalent to the following:

ARRAY[ARRAY[INT],100]

You must either specify bounds for all nested collections or specify a bound only for the outer one. For any other distribution of bounds, Vertica treats the collection as unbounded.

You can change the bounds of a collection, including changing between bounded and unbounded collections, by casting. See Casting.

Instead of specifying a bound, you can specify a maximum binary size for an unbounded collection. The binary size acts as an absolute limit, regardless of how many elements the collection contains. Collections that do not specify a maximum binary size use the value of DefaultArrayBinarySize. This size is set at the time the collection is defined and is not affected by later changes to the value of DefaultArrayBinarySize.

You can change the bounds or the binary size of an array column using ALTER TABLE as in the following example:

=> ALTER TABLE cust ALTER COLUMN orders SET DATA TYPE ARRAY[INTEGER](100);

If the change reduces the size of the collection and would result in data loss, the change fails.

You cannot set a maximum binary size for a bounded collection, only an unbounded one.

Comparisons

All collections support equality (=), inequality (<>), and null-safe equality (<=>). 1D collections also support comparison operators (<, <=, >, >=) between collections of the same type (arrays or sets). Comparisons follow these rules:

  • A null collection is ordered last.
  • Non-null collections are compared element by element, using the ordering rules of the element's data type. The relative order of the first pair of non-equal elements determines the order of the two collections.
  • If all elements in both collections are equal up to the length of the shorter collection, the shorter collection is ordered before the longer one.
  • If all elements in both collections are equal and the collections are of equal length, the collections are equal.

Null Handling

Null semantics for collections are consistent with normal columns in most regards. See NULL Sort Order for more information on null-handling.

The null-safe equality operator (<=>) behaves differently from equality (=) when the collection is null rather than empty. Comparing a collection to NULL strictly is undefined.

=> SELECT ARRAY[1,3] = NULL;
?column?
----------

(1 row)

=> SELECT ARRAY[1,3] <=> NULL;
 ?column?
----------
 f
(1 row)

In the following example, the grants column in the table is null for employee 99.

=> SELECT grants = NULL FROM employees WHERE id=99;
 ?column?
----------

(1 row)

=> SELECT grants <=> NULL FROM employees WHERE id=99;
 ?column?
----------
 t
(1 row)

Empty collections are not null and behave as expected.

=> SELECT ARRAY[]::ARRAY[INT] = ARRAY[]::ARRAY[INT];
 ?column?
----------
 t
(1 row)

Collections are compared element by element. If a comparison depends on a null element, the result is unknown (null), not false. For example, ARRAY[1,2,null]=ARRAY[1,2,null] and ARRAY[1,2,null]=ARRAY[1,2,3] both return null, but ARRAY[1,2,null]=ARRAY[1,4,null] returns false because the second elements do not match.

Casting

Casting an array casts each element of the array. You can therefore cast between data types following the same rules as for casts of scalar values.

You can cast both literal arrays and array columns explicitly:

=> SELECT ARRAY['1','2','3']::ARRAY[INT];
  array
---------
[1,2,3]
(1 row)

=> CREATE TABLE transactions (tid INT, prod_ids ARRAY[VARCHAR], quantities ARRAY[VARCHAR(32)]);

=> INSERT INTO transactions VALUES (12345, ARRAY['p1265', 'p4515'], ARRAY['15','2']);
 
=> SELECT quantities :: ARRAY[INT] FROM transactions;
 quantities
------------
   [15,2]
(1 row)

Assignment casts and implicit casts work the same way as for scalars:

=> CREATE TABLE txreport (prod_ids ARRAY[VARCHAR], quants ARRAY[INT]);

--- transactions.quantities is an array of varchar, cast here to int
=> INSERT INTO txreport SELECT prod_ids, quants FROM transactions;
				
=> SELECT APPLY_SUM(quantities) FROM transactions;
				

You can perform explicit casts, but not implicit casts, between the ARRAY and SET types. If the collection is unbounded and the data type does not change, the binary size is preserved. For example, if you cast an ARRAY[INT] to a SET[INT], the set has the same binary size as the array.

You can change the bound of an array or set by casting. Casting a bounded array to a smaller bound, or casting an unbounded array to a bounded array, can truncate data from the result:

=> CREATE TABLE cust (id INT, orders ARRAY[INT]);

=> SELECT orders :: ARRAY[INT,4] from cust; --- elements after the fourth are omitted

If you cast from one element type to another, the resulting collection uses the default binary size. If this would cause the data not to fit, the cast fails.

You cannot cast from an array to an array with a different dimensionality, for example from a two-dimensional array to a one-dimensional array.

Functions and Operators

See Collection Functions for a comprehensive list of functions that can be used to manipulate arrays and sets.

Collections can be used in the following ways:

Collections cannot be used in the following ways:

  • As part of an IN or NOT IN expression.
  • As partition columns when creating tables.
  • With ANALYZE_STATISTICS or TopK projections.
  • Non-native arrays only: ORDER BY, PARTITION BY, DEFAULT, SET USING, or constraints.