SET USING versus DEFAULT
Columns in a flattened table can query other tables with constraints SET USING
and DEFAULT
. In both cases, changes in the queried tables are not automatically propagated to the flattened table. The two constraints differ as described below.
DEFAULT Columns
Vertica executes DEFAULT
queries only on new rows when they are added to the flattened table, through load operations such as INSERT
and COPY
. Thereafter, changes in the original data sources have no effect on the flattened table.
SET USING Columns
Vertica executes SET USING
queries only when you invoke the function REFRESH_COLUMNS
. Load operations set SET USING
columns in new rows to NULL
. After the load, you must call REFRESH_COLUMNS
to populate these columns from the queried tables. This can be useful in two ways: you can defer the overhead of updating the flattened table to any time that is convenient; and you can repeatedly query source tables for new data.
It might be more efficient to drop the column and add a new one. ALTER TABLE…ADD COLUMN
does not call REFRESH_COLUMNS
when you add a SET USING
column to a table.
SET USING
is especially useful for large flattened tables that reference data from multiple dimension tables. Often, only a small subset of SET USING
columns are subject to change, and queries on the flattened table do not always require up-to-the-minute data. Given this scenario, you can refresh table content at regular intervals, or only during off-peak hours. One or both of these strategies can minimize overhead, and facilitate performance when querying large data sets.
Partition-based Rebuild Operations
If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS
in REBUILD
mode, by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For example, table public.orderFact
is defined with SET USING
column cust_name
. This table is partitioned on column order_date
, where the partition clause invokes Vertica function CALENDAR_HIERARCHY_DAY
. Thus, you can call REFRESH_COLUMNS
on specific time-delimited partitions of this table—in this case, on orders over the last two months:
=> SELECT REFRESH_COLUMNS ('public.orderFact', 'cust_name', 'REBUILD', TO_CHAR(ADD_MONTHS(current_date, -2),'YYYY-MM')||'-01', TO_CHAR(LAST_DAY(ADD_MONTHS(current_date, -1)))); REFRESH_COLUMNS --------------------------- refresh_columns completed (1 row)
Combining DEFAULT and SET USING Constraints
A column can specify both DEFAULT and SET USING constraints, as follows:
column-name data-type DEFAULT default-expr SET USING using-expr
Typically, both constraints specify the same expression. In this case, you can define the column as follows:
column-name data-type DEFAULT USING expression
DEFAULT USING columns support the same expressions as SET USING columns, and are subject to the same restrictions.
Example
The following SQL illustrates differences between SET USING
and DEFAULT
constraints. The examples use the custDim
and orderFact
tables described in Flattened Table Example.
The following UPDATE
statement updates the custDim
table:
=> UPDATE custDim SET name='Roz', gender='F' WHERE cid=2; OUTPUT -------- 1 (1 row) => COMMIT; COMMIT
Changes are not propagated to flattened table orderFact
, which includes SET USING
and DEFAULT
columns cust_name
and cust_gender
, respectively:
=> SELECT * FROM custDim ORDER BY cid; cid | name | age | gender -----+-------+-----+-------- 1 | Alice | 25 | F 2 | Roz | 30 | F 3 | Eva | 32 | F (3 rows) => SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid; order_id | order_date | cid | cust_name | cust_gender | amount ----------+------------+-----+-----------+-------------+--------- 100 | 2018-12-31 | 1 | Alice | F | 15.00 200 | 2018-12-31 | 1 | Alice | F | 1000.00 300 | 2018-12-31 | 2 | Boz | M | -50.00 500 | 2018-12-31 | 2 | Boz | M | 200.00 400 | 2018-12-31 | 3 | Eva | F | 100.00 (5 rows)
The following INSERT
statement invokes the cust_gender
column's DEFAULT
query and sets that column to F
. The load operation does not invoke the cust_name
column's SET USING
query, so cust_name
is set to null:
=> INSERT INTO orderFact(order_id, cid, amount) VALUES(500, 3, 750); OUTPUT -------- 1 (1 row) => COMMIT; COMMIT => SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid; order_id | order_date | cid | cust_name | cust_gender | amount ----------+------------+-----+-----------+-------------+--------- 100 | 2018-12-31 | 1 | Alice | F | 15.00 200 | 2018-12-31 | 1 | Alice | F | 1000.00 300 | 2018-12-31 | 2 | Boz | M | -50.00 500 | 2018-12-31 | 2 | Boz | M | 200.00 400 | 2018-12-31 | 3 | Eva | F | 100.00 500 | 2018-12-31 | 3 | | F | 750.00 (6 rows)
To set a value in cust_name
, invoke its SET USING
query by calling REFRESH_COLUMNS
:
=> SELECT REFRESH_COLUMNS ('orderFact',''); REFRESH_COLUMNS --------------------------- refresh_columns completed (1 row) => COMMIT; COMMIT => SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid; order_id | order_date | cid | cust_name | cust_gender | amount ----------+------------+-----+-----------+-------------+--------- 100 | 2018-12-31 | 1 | Alice | F | 15.00 200 | 2018-12-31 | 1 | Alice | F | 1000.00 300 | 2018-12-31 | 2 | Roz | M | -50.00 500 | 2018-12-31 | 2 | Roz | M | 200.00 400 | 2018-12-31 | 3 | Eva | F | 100.00 500 | 2018-12-31 | 3 | Eva | F | 750.00 (6 rows)
REFRESH_COLUMNS
executes cust_name
's SET USING
query: it queries the name
column in table custDim
and updates cust_name
with the following values:
- Sets
cust_name
in the new row toEva
. - Returns updated values for
cid=2
, and changesBoz
toRoz
.
REFRESH_COLUMNS
only affects the values in column cust_name
. Values in column gender
are unchanged, so settings for rows where cid=2
(Roz
) remain set to M
. To repopulate orderFact.cust_gender
with default values from custDim.gender
, call UPDATE
on orderFact
:
=> UPDATE orderFact SET cust_gender=DEFAULT WHERE cust_name='Roz'; OUTPUT -------- 2 (1 row) => COMMIT; COMMIT => SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid; order_id | order_date | cid | cust_name | cust_gender | amount ----------+------------+-----+-----------+-------------+--------- 100 | 2018-12-31 | 1 | Alice | F | 15.00 200 | 2018-12-31 | 1 | Alice | F | 1000.00 300 | 2018-12-31 | 2 | Roz | F | -50.00 500 | 2018-12-31 | 2 | Roz | F | 200.00 400 | 2018-12-31 | 3 | Eva | F | 100.00 500 | 2018-12-31 | 3 | Eva | F | 750.00 (6 rows)