You run your newly crafted query and patiently wait for the results to appear on the terminal. You stare at your clock, waiting. 1 minute, 2 minutes, then 5, then 10. Your heart sinks. Why is it taking so long,? The query should be done by now, you tell yourself. You built your projections to optimize the joins, you’re sure there is enough memory to avoid spilling to disk. You start to doubt yourself at this point, so you’ll check to make sure.
You decide to run EXPLAIN to see if there’s anything obvious that the optimizer did incorrectly. You open a separate VSQL window and run EXPLAIN. You can see that there’s a hash-join at Path ID 4-that’s not good. You wonder, why isn’t this a merge-join? And, you could have sworn you were joining on sorted columns. You’d better check the sort order on the columns for your projections. What’s the query for that, again, you wonder. Well, since that may not be the bottleneck anyway; you decide to check the profile information for the query. You try to remember– which table stores profile information? EXECUTION_ENGINE_PROFILES, or QUERY_PLAN_PROFILES?”? What columns? Probably should select on all of them and see which columns I need.
And once you do find the columns you need, you may realize that trying to understand VSQL profile-metric outputs is not how you want to spend your afternoon.
But that doesn’t mean you are forever doomed to wade through dense text to get your answers…
Welcome to Management Console Query Plan Visualizer!
In the HP Vertica Analytics Platform 7., Management Console (MC) offers a simple interface, the Query Plan Visualizer, for getting plan or profile information on the your query. The Query Plan Visualizer provides a graphical view of the details of your query, allowing you to quickly identify and understand problem areas.
Let’s examine the same query mentioned previously using MC’s Query Plan Visualizer. Just paste in the query text and click Explain Plan . The results are shown here:
MC’s EXPLAIN output maintains the structure of the plan, and also highlights important information such as “No Statistics,” while linking to relevant metadata for the projections used and columns materialized. For example, we can see that Path ID 3 is a hash join, but now we can actually find out why.
So now we know why there was a hash-join instead of a merge-join. But how do we see how the query was actually executed? We can get the profile metrics for your query using either of these methods:
- We can click “Profile Query.” MC then executes the query and displays profile information for it once it completes. However, our query takes a while to run. This option may take a bit of time…
- We can also examine past query activity via the Activity tab located at the bottom of the screen. Using the Activity tab, we can retrieve profile information for queries we’ve already run.
In this case, we’ll choose the second option.
To do so:
- Go to the Activity tab,
- Select Queries from the dropdown menu for our chart type. This chart shows us a graph of number of queries run in the past.
Because we know our query was run recently, we’ll see it at the right side of the graph. Clicking that location brings us a table of query activity from the past few minutes. Sorting the queries by Elapsed brings our long-running query to the top.
Clicking Explain/Profile on the far right of the table brings us back to the Query Plan Visualizer page and requests the profile information from the HP Vertica database.
The screen above shows a collapsed view of the profile information, which hides projection and column information. Metric information for each path appears to the right of the plan. We can measure 5 types of metrics for each path: disk usage, memory usage, data sent, data received, and time spent. Each blue bar represents the relative usage of a metric among all other paths. For example, in the Time column, we can see that the row of Path ID 3 has the largest blue bar (at about 35% fullness). This means, that out of all the paths, Path ID 3 took 35% of the total execution time. Now we can easily see that it was indeed our hash-join that took the most amount of time. Additionally, we can see that the disk-read on Path ID 6 was also responsible for a significant portion of the execution time.
So what about that pie chart? The pie chart shows how long the query took in each of its phases. As the query runs, it goes through multiple phases before it completes. Ideally, the query will spend most of its time in the “execution phase,” as the other phases should happen relatively quickly. So if your pie chart is mostly green, that’s good. Think of the chart as a sanity check that validates whether your query spent most of its time where it should.
Additionally, if you want to track the progress of a long running query, you can profile it with “Enable Monitoring” checked. With monitoring enabled, the counter values on the right hand side update at the set interval time, as well as show how much they increased or decreased by since the previous update. So rather than waiting for the query to complete profiling before you can see profile metric information, you can get the latest information on what paths are currently being processed at your set update-interval.
By removing the need to know the specific queries required for getting profile information, and by making relevant data (projection metadata, query events) just a click away, the MC Query Plan Visualizer can greatly simplify the process of getting and understanding profiling information. If you’re still using version pre-7.0 version of MC, be sure to upgrade to a new Vertica 7.0 and give this a whirl