GROUPED Clause
Groups two or more columns into a single disk file. This minimizes file I/O for work loads that:
- Read a large percentage of the columns in a table.
- Perform single row look-ups.
- Query against many small columns.
- Frequently update data in these columns.
If you have data that is always accessed together and it is not used in predicates, you can increase query performance by grouping these columns. Once grouped, queries can no longer independently retrieve from disk all records for an individual column independent of the other columns within the group.
RLE encoding is reduced when an RLE column is grouped with one or more non-RLE columns.
When grouping columns you can:
-
Group some of the columns:
(a, GROUPED(b, c), d)
-
Group all of the columns:
(GROUPED(a, b, c, d))
-
Create multiple groupings in the same projection:
(GROUPED(a, b), GROUPED(c, d))
Vertica performs dynamic column grouping. For example, to provide better read and write efficiency for small loads, Vertica ignores any projection-defined column grouping (or lack thereof) and groups all columns together by default.
Grouping Correlated Columns
The following example shows how to group highly correlated columns bid
and ask
. The stock
column is stored separately.
=> CREATE TABLE trades (stock CHAR(5), bid INT, ask INT); => CREATE PROJECTION tradeproj (stock ENCODING RLE, GROUPED(bid ENCODING DELTAVAL, ask)) AS (SELECT * FROM trades) KSAFE 1;
The following example show how to create a projection that uses expressions in the column definition. The projection contains two integer columns a
and b
, and a third column product_value
that stores the product of a
and b
:
=> CREATE TABLE values (a INT, b INT=> CREATE PROJECTION product (a, b, product_value) AS SELECT a, b, a*b FROM values ORDER BY a KSAFE;