DEFAULT versus SET USING

Columns in a flattened table can query other tables with constraints DEFAULT and SET USING. In both cases, changes in the queried tables are not automatically propagated to the flattened table. The two constraints differ as follows:

DEFAULT Columns

Vertica typically executes DEFAULT queries on new rows when they are added to the flattened table, through load operations such as INSERT and COPY, or when you create or alter a table with a new column that specifies a DEFAULT expression. In all cases, values in existing rows, including other columns with DEFAULT expressions, remain unchanged.

To refresh a column's default values, you must explicitly call UPDATE on that column as follows:

=> UPDATE table-name SET column-name=DEFAULT;

SET USING Columns

Vertica executes SET USING queries only when you invoke the function REFRESH_COLUMNS. Load operations set SET USING columns in new rows to NULL. After the load, you must call REFRESH_COLUMNS to populate these columns from the queried tables. This can be useful in two ways: you can defer the overhead of updating the flattened table to any time that is convenient; and you can repeatedly query source tables for new data.

SET USING is especially useful for large flattened tables that reference data from multiple dimension tables. Often, only a small subset of SET USING columns are subject to change, and queries on the flattened table do not always require up-to-the-minute data. Given this scenario, you can refresh table content at regular intervals, or only during off-peak hours. One or both of these strategies can minimize overhead, and facilitate performance when querying large data sets.

You can control the scope of the refresh operation by calling REFRESH _COLUMNS in one of two modes:

  • UPDATE : Marks original rows as deleted and replaces them with new rows. In order to save these updates, you must issue a COMMIT statement.
  • REBUILD: Replaces all data in the specified columns. The rebuild operation is auto-committed.

If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS in REBUILD mode by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For details, see Partition-based REBUILD Operations.

Examples

Examples use the custDim and orderFact tables described in Flattened Table Example.

The following UPDATE statement updates the custDim table:

=> UPDATE custDim SET name='Roz', gender='F' WHERE cid=2;
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT

Changes are not propagated to flattened table orderFact, which includes SET USING and DEFAULT columns cust_name and cust_gender, respectively:

=> SELECT * FROM custDim ORDER BY cid;
 cid | name  | age | gender
-----+-------+-----+--------
   1 | Alice |  25 | F
   2 | Roz   |  30 | F
   3 | Eva   |  32 | F
(3 rows)

=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Boz       | M           |  -50.00
      500 | 2018-12-31 |   2 | Boz       | M           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
(5 rows)

The following INSERT statement invokes the cust_gender column's DEFAULT query and sets that column to F. The load operation does not invoke the cust_name column's SET USING query, so cust_name is set to null:

=> INSERT INTO orderFact(order_id, cid, amount)  VALUES(500, 3, 750);
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=>  SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Boz       | M           |  -50.00
      500 | 2018-12-31 |   2 | Boz       | M           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
      500 | 2018-12-31 |   3 |           | F           |  750.00
(6 rows)

To update the values in cust_name, invoke its SET USING query by calling REFRESH_COLUMNS. REFRESH_COLUMNS executes cust_name's SET USING query: it queries the name column in table custDim and updates cust_name with the following values:

  • Sets cust_name in the new row to Eva.
  • Returns updated values for cid=2, and changes Boz to Roz.
=> SELECT REFRESH_COLUMNS ('orderFact','');
      REFRESH_COLUMNS 
---------------------------
 refresh_columns completed
(1 row)

=> COMMIT;
COMMIT
=>  SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Roz       | M           |  -50.00
      500 | 2018-12-31 |   2 | Roz       | M           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
      500 | 2018-12-31 |   3 | Eva       | F           |  750.00
(6 rows)

REFRESH_COLUMNS only affects the values in column cust_name. Values in column gender are unchanged, so settings for rows where cid=2 (Roz) remain set to M. To repopulate orderFact.cust_gender with default values from custDim.gender, call UPDATE on orderFact:

=> UPDATE orderFact SET cust_gender=DEFAULT WHERE cust_name='Roz';
 OUTPUT
--------
      2
(1 row)

=> COMMIT;
COMMIT
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Roz       | F           |  -50.00
      500 | 2018-12-31 |   2 | Roz       | F           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
      500 | 2018-12-31 |   3 | Eva       | F           |  750.00
(6 rows)