
Before release 8.1., Vertica users could denormalize their data by combining all fact and dimension table columns in a single ‘fat’ table. These tables facilitated faster query execution. However, this approach required users to maintain redundant sets of normalized and denormalized data, which incurred its own overhead.
Vertica 8.1 addressed this issue by introducing flattened tables. Flattened tables include columns that get their values by querying other tables. Operations on the source tables and flattened table are decoupled: changes in one are not automatically propagated to the other. Instead, columns are populated with queried data only when the column is created, or when you explicitly refresh it. This approach minimizes the overhead that is otherwise typical of conventional denormalized tables.
Example
Flattened tables are typically fact tables where one or more columns query other tables for their values, using DEFAULT or SET USING constraints.The following example uses two normalized tables:
• Dimension table custDim:
=> SELECT * FROM custDim;
cid | name | age | gender
-----+-------+-----+--------
1 | Alice | 25 | F
2 | Bob | 30 | M
3 | Eve | 28 | F
(3 rows)
• Fact table orderFact:
=> SELECT * FROM orderFact ORDER BY order_id;
order_id | cid | amount
----------+-----+---------
100 | 1 | 15.00
200 | 1 | 1000.00
300 | 2 | -50.00
400 | 3 | 100.00
500 | 2 | 200.00
(5 rows)
You can create a flattened table that combines data from both tables, as follows: 1. Convert the orderFact table to a flattened table. Add a column with DEFAULT constraint that queries customer names from the custDim table:
=> ALTER TABLE orderFact ADD COLUMN cust_name VARCHAR(20)
DEFAULT (SELECT name FROM custDim
WHERE orderFact.cid = custDim.cid);
ALTER TABLE
=> SELECT * FROM orderFact ORDER BY order_id;
=> SELECT * FROM orderFact ORDER BY order_id;
order_id | cid | amount | cust_name
----------+-----+---------+-----------
100 | 1 | 15.00 | Alice
200 | 1 | 1000.00 | Alice
300 | 2 | -50.00 | Bob
400 | 3 | 100.00 | Eve
500 | 2 | 200.00 | Bob
(5 rows)
2. Next, insert a new row of data into orderFact:
=> INSERT INTO orderFact VALUES (600, 3, 10.00, DEFAULT);
OUTPUT
--------
1
(1 row)
=> SELECT * FROM orderFact ORDER BY order_id;
order_id | cid | amount | cust_name
----------+-----+---------+-----------
100 | 1 | 15.00 | Alice
200 | 1 | 1000.00 | Alice
300 | 2 | -50.00 | Bob
400 | 3 | 100.00 | Eve
500 | 2 | 200.00 | Bob
600 | 3 | 10.00 | Eve
(6 rows)
The value Eve is inserted automatically into the cust_name column. 3. After you insert values, update custDim with a name change:
=> UPDATE custDim SET name ='Charlene' WHERE cid = 3;
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> SELECT * from custDim;
cid | name | age | gender
-----+---------+-----+--------
1 | Alice | 25 | F
2 | Bob | 30 | M
3 | Charlene | 28 | F
(3 rows)
4. Propagate the name change to the orderFact table:
=> UPDATE orderFact SET cust_name = DEFAULT WHERE cid = 3;
OUTPUT
--------
2
=> SELECT * FROM orderFact ORDER BY order_id;
order_id | cid | amount | cust_name
----------+-----+---------+-----------
100 | 1 | 15.00 | Alice
200 | 1 | 1000.00 | Alice
300 | 2 | -50.00 | Bob
400 | 3 | 100.00 | Charlene
500 | 2 | 200.00 | Bob
600 | 3 | 10.00 | Charlene
(6 rows)
5. Now add another column to orderFact that queries orderDim with a SET USING constraint:
=> ALTER TABLE orderFact ADD COLUMN cust_gender VARCHAR(1)
SET USING (SELECT gender FROM custDim
WHERE orderFact.cid = custDim.cid);
ALTER TABLE
Initially, the new column is empty:
=> SELECT * FROM orderFact ORDER BY cid;
order_id | cid | amount | cust_name | cust_gender
----------+-----+---------+-----------+-------------
100 | 1 | 15.00 | Alice |
200 | 1 | 1000.00 | Alice |
300 | 2 | -50.00 | Bob |
500 | 2 | 200.00 | Bob |
400 | 3 | 100.00 | Charlene |
600 | 3 | 10.00 | Charlene |
(6 rows)
6. Populate the SET USING column by calling the Vertica function REFRESH_COLUMNS:
=> SELECT REFRESH_COLUMNS ('orderFact', 'cust_gender');
REFRESH_COLUMNS
---------------------------
refresh_columns completed
(1 row)
=> SELECT * FROM orderFact ORDER BY order_id;
order_id | cid | amount | cust_name | cust_gender
----------+-----+---------+-----------+-------------
100 | 1 | 15.00 | Alice | F
200 | 1 | 1000.00 | Alice | F
300 | 2 | -50.00 | Bob | M
400 | 3 | 100.00 | Charlene | F
500 | 2 | 200.00 | Bob | M
600 | 3 | 10.00 | Charlene | F
(6 rows)
DEFAULT versus SET USING
The following table outlines the differences between DEFAULT and SET USING constraints:DEFAULT | SET USING | |
---|---|---|
Initialize from source table | -During load (COPY, INSERT) -ALTER COLUMN…ADD COLUMN DEFAULT |
-REFRESH_COLUMNS -ALTER COLUMN…ALTER COLUMN SET USING |
Refresh from modified source table | -UPDATE flattened table | -REFRESH_COLUMNS |
Note: A column can have both DEFAULT and SET USING constraints.