Aggregate Projections

This blog post was authored by Curtis Bennett.

Vertica stores physical data for tables in objects known as projections. Unlike traditional RDBMS’s, Vertica does not rely on indexes for performance. Instead, Vertica stores the physical data (either all or some of the columns) in whatever sort order is required for optimal query processing. This can be faster than indexing the data, because indexes require two-phase passes: once to get the index location, and a second to look up the data. You could have multiple projections on a table, depending on query requirements. Normally, each of these projections would all have the same number of tuples.

Vertica provides three types of projections for storing data that is returned from aggregate functions or expressions:

• Projections with expressions
• Live aggregation projections
• Top-K projections

Aggregate projections are most useful for queries against large sets of data. Because you cannot update or delete data in tables that use aggregate projections, you should use aggregate projections for tables with data that is loaded and does not require updates or deletions.

Projections with Expressions

You can create a projection with expressions. This is a special kind of projection that includes expression logic built into the projection definition. These expressions are calculated at the time of load and immediately available at query execution time. => CREATE PROJECTION values_product (a, b, c) AS SELECT a, b, a*b FROM values SEGMENTED BY HASH(a) ALL NODES KSAFE;

Live Aggregate Projections

A live aggregate projection (LAP) contains columns with values that are aggregated from columns in its table. When you load data into the table, Vertica aggregates the data before loading it into the live aggregate projection. On subsequent loads – for example through INSERT or COPY – Vertica recalculates aggregations with the new data and updates the projection with real-time (or ‘live’) values.

Vertica can aggregate results in live aggregate projections from the following aggregate functions:


Additionally, live aggregate projections can support queries that include aggregate functions qualified with the keyword DISTINCT.

Live aggregate projections provide a very fast way of running aggregation queries against very large data sets because the values are computed and stored as the data is loaded into the table. With appropriate live aggregate projections in place, it is possible to run these kind of queries against tables with billions or even trillions of rows and receive sub-second results. Additionally, special queries need not be written – the Vertica optimizer is intelligent enough to recognize whether there is a live aggregate projection in place and will use it if possible for any given aggregate query. => CREATE PROJECTION clicks_agg AS SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks GROUP BY page_id, click_time::DATE KSAFE 1;

Top-K Projections

A special kind of live aggregate projection is a Top-K projection. A Top-K query returns the top k rows from partitions of selected rows. Top-K projections can significantly improve performance of Top-K queries. For example, you can define a table that stores gas meter readings with three columns: gas meter ID, time of meter reading, and the read value. The following Top-K query returns the five most recent meter readings for a given meter: => SELECT meter_id, reading_date, reading_value FROM readings LIMIT 5 OVER (PARTITION BY meter_id ORDER BY reading_date DESC); To improve the performance of this query, you can create a Top-K projection: => CREATE PROJECTION readings_topk (meter_id, recent_date, recent_value) AS SELECT meter_id, reading_date, reading_value FROM readings LIMIT 5 OVER (PARTITION BY meter_id ORDER BY reading_date DESC); Like other cases, the Vertica optimizer will direct applicable queries to the appropriate projection if they exist and can adequately solve the request.

For more information, see Pre-Aggregating Data in Projections in the Vertica documentation.