Pre-Aggregating Data in Projections
Queries that use aggregate functions such as SUM and COUNT can perform more efficiently when they use projections that already contain the aggregated data. This improved efficiency is especially true for queries on large quantities of data.
For example, a power grid company reads 30 million smart meters that provide data at five-minute intervals. The company records each reading in a database table. Over a given year, three trillion records are added to this table.
The power grid company can analyze these records with queries that include aggregate functions to perform the following tasks:
- Establish usage patterns.
- Detect fraud.
- Measure correlation to external events such as weather patterns or pricing changes.
To optimize query response time, you can create an aggregate projection, which stores the data is stored after it is aggregated.
Vertica provides three types of projections for storing data that is returned from aggregate functions or expressions:
- Projection that contains expressions: Projection with columns whose values are calculated from anchor table columns.
- Live aggregate projection: Projection that contains columns with values that are aggregated from columns in its anchor table. You can also define live aggregate projections that include user-defined transform functions.
- Top-K projection: Type of live aggregate projection that returns the top k rows from a partition of selected rows. Create a Top-K projection that satisfies the criteria for a Top-K query.
- Aggregate projections are most useful for queries against large sets of data.
- For optimal query performance, the size of LAP projections should be a small subset of the anchor table—ideally, between 1 and 10 percent of the anchor table, or smaller, if possible.
- You cannot update or delete data in tables that have aggregate projections. To change anchor table content, you must first drop all live aggregate and Top-K projections that are associated with it, using
DROP PROJECTION. Thus, you should use aggregate projections for tables whose data is cumulative and require only infrequent updates and deletions.
In the event of manual recovery from an unclean database shutdown, live aggregate projections might require some time to refresh.