The Life of a Query, According to Henry Ford

Posted January 10, 2017 by Vertica Technical Team

Rear view of two partially unrecognizable men sitting in front of a computer. One of them is pointing at a screen where are several lines of computer code.

While Henry Ford did not in fact develop or even patent the modern assembly line (that credit goes to Ransom E. Olds), he relied heavily on the process for automobile production.

The goal of an assembly line is to create products efficiently, accurately, and with high throughput. The same can be said for queries. You want fast, accurate results, and you want to be able to produce many at a time. Thus, we can think of an executing query as an assembly line for data.

So what goes on during a query?’s ?assembly? process?

The parts

There are lots of parts at play during query execution. Let?’s start with the most obvious: the query itself. In Vertica, queries are operations that retrieve data from one or more tables or views. Queries can range from simple to incredibly complex. For query execution, Vertica relies on the following:

  • Initiator node: Node the client connects to when submitting a query.
  • Optimizer: The component that evaluates different strategies for running a query and picks the best one.
  • Explain plan: A map, or plan, of the steps the query will perform.
  • Executor node(s): Any node that participates in executing a specific SQL statement. The initiator node can, and usually does, also function as an executor node.

Sample query plan

The process

Query submission

When you connect to a node to issue a query, that node becomes the initiator node. All other nodes in the cluster become executor nodes. You submit a query by issuing a SQL statement. Think of this as the directive to build a product.

Planning

The query optimizer grabs your query and considers different ways of executing it. Often, Vertica can execute a single query in different ways. While all paths will lead to the same result, some paths are more optimal than others. To design the best plan, the optimizer considers:

  • Different query rewrites
  • Combinations of projections
  • Order in which to apply predicates
  • Order in which to execute JOINS and GROUPBYS

Then, to choose the best query plan, the optimizer performs a statistical analysis. It assigns a cost to each operator, based on the above considerations, and selects the least costly plan as the execution plan.

With our assembly line analogy, the optimizer analyzes the product idea and designs the assembly line to build it.

You can use the EXPLAIN keyword in front of a query to view the optimizer?’s plan without executing the query. You can also query the QUERY_PROFILES and PROJECTION_USAGE system tables for more information.

Distribution

Once the optimizer settles on a plan of attack, it breaks the query plan into smaller plans and the initiator node distributes these plans to the executor nodes (including itself). In a sense, the optimizer delegates tasks to different parts on the assembly line.

Results

With the plans, the product can be designed (the query can execute). All nodes execute their portion of the overall query plan locally. Each executor node sends its results back to the initiator node, which in turn aggregates the results and returns the final result to you. All of this happens in record time.

Learn more

To learn more about queries and execution in Vertica, read the documentation and community material below:

Writing Queries

Reading Query Plans

Troubleshooting Query Performance with Vertica System Tables

Looking Under the Hood at Vertica Queries