Initial Process for Improving Query Performance
To optimize query performance, begin by performing the following tasks:
Run Database Designer
Database Designer creates a physical schema for your database that provides optimal query performance. The first time you run Database Designer, you should create a comprehensive design that includes relevant sample queries and data on which to base the design. If you develop performance issues later, consider loading additional queries that you run frequently and then rerunning Database Designer to create an incremental design.
When you run Database Designer, choose the option, Update Statistics. The Vertica query optimizer uses statistics about the data to create a query plan. Statistics help the optimizer determine:
- Multiple eligible projections to answer the query
- The best order in which to perform joins
- Data distribution algorithms, such as broadcast and resegmentation
If your statistics become out of date, run the Vertica function ANALYZE_STATISTICS
function to update statistics for a schema, table, or columns. For more information, see Collecting Database Statistics.
Check Query Events Proactively
The QUERY_EVENTS system table returns information on query planning, optimization, and execution events.
The EVENT_TYPE
column provides various event types:
- Some event types are informational.
- Others you should review for possible corrective action.
- Several are most important to address.
The following table lists informational event types.
Event Type | Description |
---|---|
SIP_FALLBACK |
This optimization did not apply to this query type. |
SMALL_MERGE_REPLACED |
Vertica has chosen a more efficient way to access the data by replacing a merge. |
STORAGE_CONTAINERS_ELIMINATED |
Vertica has performed partition pruning for the purpose of optimization. |
GROUPBY PUSHDOWN |
(Message is Internal to Vertica.) |
NO GROUPBY PUSHDOWN |
(Message is Internal to Vertica.) |
VALUE_TRUNCATED |
A character value is too long. |
GROUP_BY_PREPASS_FALLBACK |
Vertica could not run an optimization. In-memory prepass is disabled. The projection may not be optimal. |
MERGE_CONVERTED_TO_UNION |
Vertica has converted a merge operator to a union operator due to the sort order of the multi-threaded storage access stream. |
TRANSITIVE PREDICATE |
Vertica has optimized by adding predicates to joins where it makes logical sense to do so. For example, for the statement, |
NODE PRUNING |
Vertica has performed node pruning, which is similar to partition pruning, but at the node level. |
SEQUENCE CACHE REFILLED |
Vertica has refilled sequence cache. |
OUTER OVERRIDE USED |
For efficiency and optimization, Vertica has swapped the inner/outer tables in a join. Vertica used the smaller table as the inner table. |
OUTER OVERRIDE NOT USED |
Vertica found swapping inner/outer tables in a join unnecessary because the inner/outer tables were in good order. (For example, a smaller table was used in an inner join.) |
EXTERNAL_PREDICATE_PUSHDOWN_NOT_SUPPORTED |
Predicate pushdown for older Hive versions may not be supported. For more information, see Improving Query Performance for Data Stored in HDFS. |
LibHDFS++ UNSUPPORTED OPERATION |
Vertica accessed HDFS using the |
LibHDFS++ FAILOVER RETRY |
Vertica attempted to contact a Name Node on an HDFS cluster that uses High Availability Name Node and did not receive a response. Vertica retried with a different Name Node. |
The following table lists event types that you should review for possible corrective action.
Event Type | Description | Action |
---|---|---|
GROUP_BY_SPILLED |
This event type is typically related to a specific type of query, which you may need to adjust. |
Identify the type of query and make adjustments accordingly. You may need to adjust resource pools, projections, or the amount of RAM available. Try running the query on a cluster with no additional workload. |
RESEGMENTED_MANY_ROWS |
This event type is typically related to a specific type of query, which you may need to adjust. |
Do projections need to be segmented in a different way to allow for join locality? Can you rewrite the query to filter out more rows at storage access time? (Typically, Vertica does so automatically through predicate pushdown.) Review your explain plan. |
RLE_OVERRIDDEN |
The average run counts are not large enough for Run Length Encoding (RLE). This event occurs with queries where the filtered results for certain columns do not work with RLE because cardinality is less than 10. |
Review and rewrite your query, if necessary. |
PREDICATE OUTSIDE HISTOGRAM |
A predicate value you are trying to match does not exist in a set of possible values for a specific column. For example, you try to match a VARCHAR value WHERE mystring = "ABC<newline>". In this case, the newline character throws off the predicate matching optimizations. |
Review and rewrite your query, if necessary. |
The following table lists more critical event types, and recommends actions you can take to resolve issues.
Event Type | Description | Action |
---|---|---|
WOS_SPILL |
WOS ran out of memory, and began spilling to ROS. |
Try one or more of the following:
|
NO HISTOGRAM |
Indicates a table does not have an updated column histogram. |
Running the function ANALYZE_STATISTICS most often corrects this issue. |
MEMORY LIMIT HIT |
Indicates query complexity or, possibly, lack of available system memory. |
Consider adjusting the MAXMEMORYSIZE and PLANNEDCONCURRENCY resource pools so that the optimizer has sufficient memory. On a heavily used system, this event may occur more frequently. |
DELETE WITH NON OPTIMIZED PROJECTION |
One or more projections do not have your delete filter column in their sort order,causing Vertica difficulty identifying rows to mark as deleted. |
Add the delete filter column to the end of every projection sort order for your target delete table. |
JOIN_SPILLED |
Vertica has spilled a join to disk. A join spill event slows down the subject query and all other queries as it consumes resources while using disk as virtual memory. |
Try the following (in sequence):
|
Review the Query Plan
A query plan is a sequence of step-like paths that the Verticaquery optimizer selects to access or alter information in your Verticadatabase. There are two ways to get information about the query plan:
- Run the EXPLAIN command. Each step (path) represents a single operation that the optimizer uses for its execution strategy.
- Query the QUERY_PLAN_PROFILES system table. This table provides detailed execution status for currently running queries. Output from the QUERY_PLAN_PROFILES table shows the real-time flow of data and the time and resources consumed for each path in each query plan.