How to Use UDx to Extend Vertica Live Aggregate Projections

A key with a key hole and Unlock Vertica Potential in text at the bottom

Thanks to its sophisticated optimizer and extremely efficient query engine Vertica can process data aggregations order of magnitudes faster than traditional Database Management Systems.

To further boost data aggregation performance, we can use Vertica’s Live Aggregate Projections (LAP from now on). The concept is very simple: pre-aggregate data once during load operations rather than doing it again and again at query run time. So, for example, if we load 1,000,000 values in in batches of 1,000 INSERTs (each consisting of 1,000 elements), using LAPs, we will have to aggregate only one thousand (partial) aggregates at query run-time rather than one million values.

In this article we will show how to extend the standard LAP functionalities by coding your own pre-pass (or batch) UDTF using easy-to-follow examples.

Using Vertica’s LAPs is quite simple. Suppose we have a “customers” table like this:

On top of the standard projections we can create an additional Live Aggregate Projection where Vertica will store pre-aggregated data; for example we might want to pre-aggregate SUM(amount) by customer id in the “lap_cust” LAP:

Now, when we load data into the base table:

Vertica will automatically pre-aggregate data in our “lap_cust”. It is important to understand LAPs contain partial aggregates at INSERT (or COPY) level. The final aggregation will be executed when we run the query. The more data we load for each INSERT/COPY, the more effective the LAP will be!

The Vertica optimizer can automatically redirect the query to the LAP containing the “right aggregates”. We can check Vertica is going to use a specific LAP by explaining the query:

The functions you can use with standard LAPs are SUM/MIN/MAX/COUNT and they are applied on the fly to all rows being loaded. However… you can create custom LAP invoking User Defined Transform Function.

I am going to show an easy-to-code example… create an additional LAP to keep SUM(account) by customer only for the ones having “flag = 1”. This is the code for the Transfom Function:

And this is the “Factory Function” to describe the function interface to Vertica:

Now we just have to compile our UDx:

and deploy it to Vertica:

We can now define a new LAP based on our new UDx which is identified in the PARTITION CLAUSE as a pre-pass UTDF:

That’s all! From now on each INSERT/COPY in the base table will automatically add new data to the LAP. For example, after the previous INSERT, our new LAP will contain:

Happy playing 🙂

About the Author

Maurizio Felici
Field Chief Technologist

Maurizio started writing complex code (in Fortran) in 1985 during his Master’s Degree in Physics when he built sensors and software to capture and analyse gravitational waves signals.

He started working in 1986 coding Unix device drivers. In 1992 he started with databases and implemented his first large Data Warehouse in 1998. In 2006 Maurizio joined Hewlett-Packard working with large MPP databases; he begun working with Columnar Databases (Vertica) in 2011.

Maurizio knows several databases, many programming languages and different Data Warehouse Architectures. He has a very long experience with complex Data Warehouse implementations and best practices about all data processing phases: from acquisition/integration to advanced data analysis.

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

How to Use UDx to Extend Vertica Live Aggregate Projections

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

How to Use UDx to Extend Vertica Live Aggregate Projections

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

See More Under the Hood Posts