As we bring our 4.0 release to market, we are starting a series of educational blog posts to provide a in-depth look at Vertica’s core technology. We start with one of our crown jewels – the Vertica Query Optimizer.
The goal of query optimizers in general is to allow users to get the maximal performance from their database without worrying about details of how it gets done. At Vertica, we take this goal to heart in everything that we build. From the first day starting out, the Vertica Optimizer team has focused on creating a product that reduces the need for manual tuning as much as possible. This lets users focus on their business needs rather than tuning our technology.
Before we dive into the unique innovations within our optimizer, let’s get a few simple facts straight:
- The Vertica Optimizer is not limited to classic Star and Snowflake Schemas – it hasn’t been since version 2.5. Many of our 130+ customers in production today are using non-star schemas with great success. In fact, our Optimizer easily handles very complicated queries – from workloads as simple as TPCH, containing only relatively simple Star queries with a few tables, to complex queries containing hundreds of joins with mixes of INNER/OUTER joins and a variety of predicates and sub-queries.
- It is not common and certainly not necessary to have one projection per query to get great performance from Vertica. While the Optimizer understands and chooses the optimal plan in the presence of several choices, few customers have found it necessary to do custom tuning for individual queries except in very unusual circumstances. It is far more typical to have great performance without such tuning at all.
- The Vertica Optimizer is the only true columnar optimizer developed from scratch to make best use of a column store engine. Unlike some other column store vendors, we do not use any part of the Postgres optimizer.
Why? Because fundamentally, we believe that no amount of retrofitting can turn a row-oriented optimizer into a column-oriented one.
For the optimizer geeks out there, here are some of the capabilities that we believe give the Vertica Optimizer that special edge over others, even mature ones:
- The entire Optimizer is designed as a set of extensible modules so that we can change the brains of the optimizer without rewriting much of the code. This means we can incorporate knowledge gleaned from end-user experiences into the Optimizer, without a lot of engineering effort. After all, when you build a system from scratch, you can build it smarter and better!
- Unlike standard optimizers that determine the optimal single-node plan and then introduce parallelizing operators into it as an after thought, our patent-pending optimizer algorithms account for data distribution during the join order enumeration phase of the optimizer. We use sophisticated heuristics based on knowledge of physical properties of the available projections to control the explosion in search space.
- Unlike standard optimizers that restrict the join search space to left-deep plans, the Vertica Optimizer considers bushy plans very naturally.
- The Vertica Optimizer is cost-based with a cost-model based not just on I/O but also CPU and Network transfer costs and takes into account the unique details of our columnar operators and runtime environment.
- The Vertica Optimizer employs many techniques that take advantage of the specifics of our sorted columnar storage and compression – for example, late materialization, compression aware costing and planning, stream aggregation, sort elimination, merge joins, etc.
- The Vertica Database designer works hand-in-glove with the optimizer by producing a physical design that can take advantage of the many clever optimizations available to the optimizer.
While innovating on the core algorithms, we have also incorporated many of the best practices developed over the past 30 years of optimizer research, such as:
- Using histograms to calculate selectivity.
- Optimizing queries to favor co-located joins where possible. Note that optimizer can handle physical designs with arbitrary distribution properties and uses distribution techniques such as re-segmented or broadcast joins.
- Transformations such as converting outer joins to inner joins, taking advantage of primary/foreign key and null constraints, sub-query de-correlation, view flattening, introducing transitive predicates based on join keys and automatically pruning out unnecessary parts of the query.
As a testament to the quality of our optimizer, we are proud to say that customers rarely override the plans produced by our optimizer. This removes an entire class of management from the DBA and letting our algorithms take full advantage of our ever-improving execution engine. That being said, we believe that performance and ease-of-use speak for themselves and so we invite you to Test Drive the Vertica Database on your schema, your queries and your data!