Understanding the Vertica Query Optimizer

Posted April 26, 2018 by Soniya Shah, Information Developer

white cloud in vault type room representing cloud computing
This blog post was authored by Soniya Shah. The Vertica query optimizer uses statistics about the data to create a query plan, which contains a set of operations to compute the requested result. Depending on the properties of the projections defined in your database, the optimizer can choose faster and more efficient operations. This is helpful because there are many ways to answer a query. The optimizer finds the best one, based on tasks such as: • Joins and predicates can be applied in any order • Aggregations can be applied before or after joins • Subqueries can be rewritten into joins • Projection sort, segmentation, encoding, and aggregation The following diagram depicts an overview of the optimizer workflow: The parser converts tokens to structures, and then the analyzer looks up those structures, such as tables, to rewrite them into queries. For example, suppose you have the following select statement: SELECT * FROM fact; The parser understands that this is a select statement. Then, the analyzer takes the select statement structure and performs rule-based rewriters on the query. The output is a modified query. For example, if you have the following view expansion: Let view1 = SELECT a,b,c FROM foo WHERE b = 5; SELECT a,b FROM view1 WHERE a = 5; After the rewriter completes, this becomes: SELECT a, b FROM (SELECT a,b,c FROM foo WHERE b = 5) T where a = 5; Next, the join graph chooses the best path. There are many ways to perform a join, so the join path chooses the optimal one. The path structure is based on the following: • The tree of query operations, such as SELECT, GROUP, and JOIN • The physical properties, such as distribution and sortedness • The cost to the CPU, network and disk The cost model takes a look at cost aspects such as how much data must be processed, the amount of memory consumed, the cost of reading the data, and how many nodes are involved. The optimizer builds multiple paths from the bottom up, keeping the one with the most advantageous costs. The path structure chooses the following: • Projections • Join orders, including inner and outer joins • Join algorithms, such as hash joins or merge joins • GROUP BY algorithms, such as GROUP BY pipe or GROUP BY hash The path structure will also apply ORDER BY and analytics. Each node in the query plan becomes an execution engine operator that processes the input rows and produces the output rows. For more information, see Query Optimization in the Vertica documentation.