Thanks to its sophisticated optimizer and extremely efficient query engine Vertica can process data aggregations order of magnitudes faster than traditional Database Management Systems.
To further boost data aggregation performance, we can use Vertica’s Live Aggregate Projections (LAP from now on). The concept is very simple: pre-aggregate data once during load operations rather than doing it again and again at query run time. So, for example, if we load 1,000,000 values in in batches of 1,000 INSERTs (each consisting of 1,000 elements), using LAPs, we will have to aggregate only one thousand (partial) aggregates at query run-time rather than one million values.
In this article we will show how to extend the standard LAP functionalities by coding your own pre-pass (or batch) UDTF using easy-to-follow examples.
Using Vertica’s LAPs is quite simple. Suppose we have a “customers” table like this:
On top of the standard projections we can create an additional Live Aggregate Projection where Vertica will store pre-aggregated data; for example we might want to pre-aggregate SUM(amount) by customer id in the “lap_cust” LAP:
Now, when we load data into the base table:
Vertica will automatically pre-aggregate data in our “lap_cust”. It is important to understand LAPs contain partial aggregates at INSERT (or COPY) level. The final aggregation will be executed when we run the query. The more data we load for each INSERT/COPY, the more effective the LAP will be!
The Vertica optimizer can automatically redirect the query to the LAP containing the “right aggregates”. We can check Vertica is going to use a specific LAP by explaining the query:
The functions you can use with standard LAPs are SUM/MIN/MAX/COUNT and they are applied on the fly to all rows being loaded. However… you can create custom LAP invoking User Defined Transform Function.
I am going to show an easy-to-code example… create an additional LAP to keep SUM(account) by customer only for the ones having “flag = 1”. This is the code for the Transfom Function:
And this is the “Factory Function” to describe the function interface to Vertica:
Now we just have to compile our UDx:
and deploy it to Vertica:
We can now define a new LAP based on our new UDx which is identified in the PARTITION CLAUSE as a pre-pass UTDF:
That’s all! From now on each INSERT/COPY in the base table will automatically add new data to the LAP. For example, after the previous INSERT, our new LAP will contain:
Happy playing 🙂