Aggregating Data Through Expressions

You can create projections where one or more columns are defined by expressions. An expression can reference one or more anchor table columns. For example, the following table contains two integer columns, a and b:

=> CREATE TABLE values (a INT, b INT);

You can create a projection with an expression that calculates the value of column c as the product of a and b:

=> CREATE PROJECTION values_product (a, b, c)
   AS SELECT a, b, a*b FROM values SEGMENTED BY HASH(a) ALL NODES KSAFE;

When you load data into this projection, Vertica resolves the expression a*b in column c. You can then query the projection instead of the anchor table. Vertica returns the pre-calculated data and avoids the overhead otherwise incurred by resource-intensive computations.

Using expressions in projections also lets you sort or segment data on the calculated results of an expression instead of sorting on single column values.

If a projection with expressions also includes aggregate functions such as SUM or COUNT, Vertica treats it like a live aggregate projection.

Support for User-Defined Scalar Functions

Currently, support for pre-aggregating UDSF results is limited to C++.

Vertica treats user-defined scalar functions (UDSFs) like other expressions. On each load operation, the UDSF is invoked and returns its results. Vertica stores these results on disk, and returns them when you query the projection directly.

In the following example, the projection points_p1 specifies the UDSF zorder, which is invoked whenever data is loaded in the anchor table points. When data is loaded into the projection, Vertica invokes this function and stores its results for fast access by future queries.

=> CREATE TABLE points(point_id INTEGER, lat NUMERIC(12,9), long NUMERIC(12,9));

=> CREATE PROJECTION points_p1
     AS SELECT point_id, lat, long, zorder(lat, long) zorder FROM points
     ORDER BY zorder(lat, long) SEGMENTED BY hash(point_id) ALL NODES;

Requirements

  • Any ORDER BY expression must be in the SELECT list.
  • All projection columns must be named.

Restrictions

  • DML operations on anchor tables are faster if the tables' load methods are set to DIRECT.
  • MERGE operations must be optimized if they are performed on target tables that have live aggregate projections.
  • Unlike live aggregate projections, Vertica does not redirect queries with expressions to an equivalent existing projection.
  • Projection expressions must be immutable—that is, they must always return the same result. For example, a projection cannot include expressions that use TO CHAR (depends on locale) or RANDOM (returns different value at each invocation).
  • Projection expressions cannot include Vertica meta-functions such as ADVANCE_EPOCH, ANALYZE_STATISTICS, EXPORT_TABLES, or START_REFRESH.