SET USING versus DEFAULT

Columns in a flattened table can query other tables with constraints SET USING and DEFAULT. In both cases, changes in the queried tables are not automatically propagated to the flattened table. The two constraints differ as described below.

DEFAULT Columns

Vertica executes DEFAULT queries only on new rows when they are added to the flattened table, through load operations such as INSERT and COPY. Thereafter, changes in the original data sources have no effect on the flattened table.

SET USING Columns

Vertica executes SET USING queries only when you invoke the function REFRESH_COLUMNS. Load operations set SET USING columns in new rows to NULL. After the load, you must call REFRESH_COLUMNS to populate these columns from the queried tables. This can be useful in two ways: you can defer the overhead of updating the flattened table to any time that is convenient; and you can repeatedly query source tables for new data.

One exception applies: when you use ALTER TABLE…ALTER COLUMN to apply SET USING to an existing column, or modify an existing SET USING expression. In this case, the DDL operation automatically invokes REFRESH_COLUMNS on the column, using REBUILD mode. After the refresh operation is complete, the DDL operation auto-commits the updates and returns. If the refresh operation fails, Vertica rolls back the entire DDL operation. Execution time can be significant if the refresh operation involves a large data set.

It might be more efficient to drop the column and add a new one. ALTER TABLE…ADD COLUMN does not call REFRESH_COLUMNS when you add a SET USING column to a table.

SET USING is especially useful for large flattened tables that reference data from multiple dimension tables. Often, only a small subset of SET USING columns are subject to change, and queries on the flattened table do not always require up-to-the-minute data. Given this scenario, you can refresh table content at regular intervals, or only during off-peak hours. One or both of these strategies can minimize overhead, and facilitate performance when querying large data sets.

Partition-based Rebuild Operations

If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS in REBUILD mode, by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For example, table public.orderFact is defined with SET USING column cust_name. This table is partitioned on column order_date, where the partition clause invokes Vertica function CALENDAR_HIERARCHY_DAY. Thus, you can call REFRESH_COLUMNS on specific time-delimited partitions of this table—in this case, on orders over the last two months:

=> SELECT REFRESH_COLUMNS ('public.orderFact', 
                        'cust_name',
                        'REBUILD', 
                        TO_CHAR(ADD_MONTHS(current_date, -2),'YYYY-MM')||'-01',
                        TO_CHAR(LAST_DAY(ADD_MONTHS(current_date, -1))));
      REFRESH_COLUMNS
---------------------------
 refresh_columns completed
(1 row)

Combining DEFAULT and SET USING Constraints

A column can specify both DEFAULT and SET USING constraints, as follows:

column-name data-type DEFAULT default-expr SET USING using-expr

Typically, both constraints specify the same expression. In this case, you can define the column as follows:

column-name data-type DEFAULT USING expression

DEFAULT USING columns support the same expressions as SET USING columns, and are subject to the same restrictions.

Example

The following SQL illustrates differences between SET USING and DEFAULT constraints. The examples use the custDim and orderFact tables described in Flattened Table Example.

The following UPDATE statement updates the custDim table:

=> UPDATE custDim SET name='Roz', gender='F' WHERE cid=2;
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT

Changes are not propagated to flattened table orderFact, which includes SET USING and DEFAULT columns cust_name and cust_gender, respectively:

=> SELECT * FROM custDim ORDER BY cid;
 cid | name  | age | gender
-----+-------+-----+--------
   1 | Alice |  25 | F
   2 | Roz   |  30 | F
   3 | Eva   |  32 | F
(3 rows)

=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Boz       | M           |  -50.00
      500 | 2018-12-31 |   2 | Boz       | M           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
(5 rows)

The following INSERT statement invokes the cust_gender column's DEFAULT query and sets that column to F. The load operation does not invoke the cust_name column's SET USING query, so cust_name is set to null:

=> INSERT INTO orderFact(order_id, cid, amount)  VALUES(500, 3, 750);
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=>  SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Boz       | M           |  -50.00
      500 | 2018-12-31 |   2 | Boz       | M           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
      500 | 2018-12-31 |   3 |           | F           |  750.00
(6 rows)

To set a value in cust_name, invoke its SET USING query by calling REFRESH_COLUMNS:

=> SELECT REFRESH_COLUMNS ('orderFact','');
      REFRESH_COLUMNS 
---------------------------
 refresh_columns completed
(1 row)

=> COMMIT;
COMMIT
=>  SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Roz       | M           |  -50.00
      500 | 2018-12-31 |   2 | Roz       | M           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
      500 | 2018-12-31 |   3 | Eva       | F           |  750.00
(6 rows)

REFRESH_COLUMNS executes cust_name's SET USING query: it queries the name column in table custDim and updates cust_name with the following values:

  • Sets cust_name in the new row to Eva.
  • Returns updated values for cid=2, and changes Boz to Roz.

REFRESH_COLUMNS only affects the values in column cust_name. Values in column gender are unchanged, so settings for rows where cid=2 (Roz) remain set to M. To repopulate orderFact.cust_gender with default values from custDim.gender, call UPDATE on orderFact:

=> UPDATE orderFact SET cust_gender=DEFAULT WHERE cust_name='Roz';
 OUTPUT
--------
      2
(1 row)

=> COMMIT;
COMMIT
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Roz       | F           |  -50.00
      500 | 2018-12-31 |   2 | Roz       | F           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
      500 | 2018-12-31 |   3 | Eva       | F           |  750.00
(6 rows)