SET

Represents a collection of unordered, unique elements. Sets may contain only primitive types. In sets, unlike in arrays, element position is not meaningful.

If you populate a set from an array, Vertica sorts the values and removes duplicate elements. If you do not care about element position and plan to run queries that check for the presence of specific elements (find, contains), using a set could improve query performance.

Syntax for Direct Construction

You can use the keyword SET to construct a set type. Literal set values are contained in brackets. For example, to create a set of INT, you would do the following:

=> SELECT SET[1,2,3];
  set
-------
 [1,2,3]
(1 row)

You can explicitly convert an array to a set by casting, as in the following example:

=> SELECT ARRAY[1, 5, 2, 6, 3, 0, 6, 4]::SET[INT];
     set
-----------------
[0,1,2,3,4,5,6]
(1 row)

Notice that duplicate elements have been removed and the elements have been sorted.

Set Input Format for Column Definition

Use "SET[data_type]" to declare a set column in a table, as in the following example.

=> CREATE TABLE users
(
user_id INTEGER, 
display_name VARCHAR,
email_addrs SET[VARCHAR]
);

When you load array data into a column defined as a set, the array data is automatically converted to a set.

Set Output Format

Sets are shown in a JSON-like format, with comma-separated elements contained in brackets (like arrays). In the following example, the email_addrs column is a set.

=> SELECT custkey,email_addrs FROM customers LIMIT 4;
 custkey |                           email_addrs
---------+------------------------------------------------------------------------
 342176  | ["joe.smith@example.com"]
 342799  | ["bob@example,com","robert.jones@example.com"]
 342845  | ["br92@cs.example.edu"]
 342321  | ["789123@example-isp.com","sjohnson@eng.example.com","sara@johnson.example.name"]

Restrictions

Sets support only data of primitive types, for example, int, UUID, and so on.

Null Handling

Null semantics for collections are consistent with normal columns. See NULL Sort Order for more information on null-handling.

Casting

Casting a set casts each element of the set. You can therefore cast between data types following the same rules as for casts of scalar values.

You can cast both literal sets and set columns explicitly:

=> SELECT SET['1','2','3']::SET[INT];
   set
---------
[1,2,3]
(1 row)

=> CREATE TABLE transactions (tid INT, prod_ids SET[VARCHAR], quantities SET[VARCHAR(32)]);

=> INSERT INTO transactions VALUES (12345, SET['p1265', 'p4515'], SET['15','2']);
 
=> SELECT quantities :: SET[INT] FROM transactions;
 quantities
------------
   [15,2]
(1 row)

Assignment casts and implicit casts work the same way as for scalars.

You can perform explicit casts, but not implicit casts, between ARRAY and SET types. When casting an array to a set, Vertica first casts each element and then sorts the set and removes duplicates. If two source values are cast to the same target value, one of them will be removed. For example, if you cast an array of FLOAT to a set of INT, two values in the array might be rounded to the same integer and then be treated as duplicates. This also happens if the array contains more than one value that is cast to NULL.

Casting can increase the storage needed for a column. For example, if you cast an array of INT to an array of VARCHAR(50), each element takes more space and thus the array takes more space. If the difference is extreme or the array has many elements, this could mean that the array no longer fits within the space allotted for the column. In this case the operation reports an error and fails.

Functions and Operators

See Collection Functions for a comprehensive list of functions that can be used to manipulate arrays and sets.

Collections support equality (=), inequality (<>), and comparison operators (<, <=, >, >=) between collections of the same type (arrays or sets). Comparisons follow these rules:

  • A null collection is ordered last.
  • Non-null collections are compared element by element, using the ordering rules of the element's data type. The relative order of the first pair of non-equal elements determines the order of the two collections.
  • If all elements in both collections are equal up to the length of the shorter collection, the shorter collection is ordered before the longer one.
  • If all elements in both collections are equal and the collections are of equal length, the collections are equal.

Collections can be used in the following ways:

Collections cannot be used as partition columns when creating tables. Collections cannot be used with analyze_statistics() or TopK projections.