Pre-Aggregating Data in Projections

Queries that use aggregate functions such as SUM and COUNT can perform more efficiently when they use projections that already contain the aggregated data. This improved efficiency is especially true for queries on large quantities of data.

For example, a power grid company reads 30 million smart meters that provide data at five-minute intervals. The company records each reading in a database table. Over a given year, three trillion records are added to this table.

The power grid company can analyze these records with queries that include aggregate functions to perform the following tasks:

  • Establish usage patterns.
  • Detect fraud.
  • Measure correlation to external events such as weather patterns or pricing changes.

To optimize query response time, you can create an aggregate projection, which stores the data is stored after it is aggregated.

Aggregate Projections

Vertica provides three types of projections for storing data that is returned from aggregate functions or expressions:

Recommended Use

  • Aggregate projections are most useful for queries against large sets of data.
  • For optimal query performance, the size of LAP projections should be a small subset of the anchor table—ideally, between 1 and 10 percent of the anchor table, or smaller, if possible.
  • You cannot update or delete data in tables that have aggregate projections. To change anchor table content, you must first drop all live aggregate and Top-K projections that are associated with it, using DROP PROJECTION. Thus, you should use aggregate projections for tables whose data is cumulative and require only infrequent updates and deletions.

Requirements

In the event of manual recovery from an unclean database shutdown, live aggregate projections might require some time to refresh.

In This Section