Tech Support Series: Optimizing Projections

Posted December 10, 2014 by Sarah Lemaire, Manager, Vertica Documentation

Welcome to another installment of our Top Tech Support Questions Answered blog series. In our first blog, we discussed ways to optimize your database for deletes. In this installment, we’?ll talk about optimizing projections.

People often ask, ?”How can I optimize my projections for maximum query performance??” Like with many questions, the answer is ?”It depends.?” This is because every database has a different structure and uses data in very different ways.

But fear not, we do have some general guidelines you can use to optimize your projections.

Your first step should always be to run Database Designer. The Vertica Database Designer creates optimized projections based on sample queries and sample data that you provide. However, you may find that you want to create your own projections as well.

If you feel you must create your own projections, focus on three characteristics:

  • Sort Order
  • Segmentation
  • Encoding

Sort Order

Choosing a good sort order can help you achieve maximum query performance. If you have queries that contain GROUP BY clauses, joins, or other predicates, it’?s good practice to place the columns specified in those clauses early in the sort order. If you have no other criteria on how to sort your data, the fastest way to access the data is to first sort on the columns with the smallest number of distinct values (lowest cardinality) before the high-cardinality columns.

Segmentation

Also consider creating segmented projections on large tables to spread the query execution workload across multiple nodes. Projection segmentation also provides high availability and recovery, and optimizes query execution. Therefore, it?’s important to determine which columns to use to segment a projection. For Vertica, hash segmentation is the preferred method of segmentation. Primary key columns that have a large number of unique data values (high cardinality) and acceptable skew in their data distribution are an excellent choice for hash segmentation.

Encoding

Database Designer implements optimum encoding for the data you provide. Likewise, when creating your own projections, make sure you specify the encoding on your projection columns to optimize query performance. With appropriate encoding, you can reduce your database footprint and improve query performance. Read more about the encoding types Vertica supports here.

So there you have it– three main characteristics to consider when creating your own projections. As mentioned before, all databases are different, so you may find that leveraging one approach over another is more beneficial for you. But focusing on these three things can make the whole process a little less daunting.

Stay tuned for more tech support blogs!

To learn about optimizing your projections using Database Designer, see our documentation.