Initial Process for Improving Query Performance

To optimize query performance, begin by performing the following tasks:

  1. Run Database Designer.
  2. Check query events proactively.
  3. Review the query plan.

Run Database Designer

Database Designer creates a physical schema for your database that provides optimal query performance. The first time you run Database Designer, you should create a comprehensive design that includes relevant sample queries and data on which to base the design. If you develop performance issues later, consider loading additional queries that you run frequently and then rerunning Database Designer to create an incremental design.

When you run Database Designer, choose the option, Update Statistics. The Vertica query optimizer uses statistics about the data to create a query plan. Statistics help the optimizer determine:

If your statistics become out of date, run the Vertica function ANALYZE_STATISTICS function to update statistics for a schema, table, or columns. For more information, see Collecting Database Statistics.

Check Query Events Proactively

The QUERY_EVENTS system table returns information on query planning, optimization, and execution events.

The EVENT_TYPE column provides various event types:

The following table lists informational event types.

Event Type Description
SIP_FALLBACK

This optimization did not apply to this query type.

SMALL_MERGE_REPLACED

Vertica has chosen a more efficient way to access the data by replacing a merge.

STORAGE_CONTAINERS_ELIMINATED
Vertica has performed partition pruning for the purpose of optimization.
GROUPBY PUSHDOWN
(Message is Internal to Vertica.)
NO GROUPBY PUSHDOWN
(Message is Internal to Vertica.)
VALUE_TRUNCATED
A character value is too long.
GROUP_BY_PREPASS_FALLBACK
Vertica could not run an optimization. In-memory prepass is disabled. The projection may not be optimal.
MERGE_CONVERTED_TO_UNION
Vertica has converted a merge operator to a union operator due to the sort order of the multi-threaded storage access stream.
TRANSITIVE PREDICATE

Vertica has optimized by adding predicates to joins where it makes logical sense to do so.

For example, for the statement, SELECT * FROM A, B WHERE A.a = B.a AND A.a = 1;Vertica may add a predicate B a = 1 as a filter for better storage access of table B.

NODE PRUNING
Vertica has performed node pruning, which is similar to partition pruning, but at the node level.
SEQUENCE CACHE REFILLED
Vertica has refilled sequence cache.
OUTER OVERRIDE USED
For efficiency and optimization, Vertica has swapped the inner/outer tables in a join. Vertica used the smaller table as the inner table.
OUTER OVERRIDE NOT USED
Vertica found swapping inner/outer tables in a join unnecessary because the inner/outer tables were in good order. (For example, a smaller table was used in an inner join.)
EXTERNAL_PREDICATE_PUSHDOWN_NOT_SUPPORTED

Predicate pushdown for older Hive versions may not be supported.  For more information, see Improving Query Performance for Data Stored in HDFS.

LibHDFS++ UNSUPPORTED OPERATION

Vertica accessed HDFS using the hdfs URL scheme, but the HDFS cluster uses an unsupported feature such as wire encryption or HTTPS_ONLY. Vertica fell back to WebHDFS.

LibHDFS++ FAILOVER RETRY

Vertica attempted to contact a Name Node on an HDFS cluster that uses High Availability Name Node and did not receive a response. Vertica retried with a different Name Node.

The following table lists event types that you should review for possible corrective action.

Event Type Description Action
GROUP_BY_SPILLED

This event type is typically related to a specific type of query, which you may need to adjust.

Identify the type of query and make adjustments accordingly. You may need to adjust resource pools, projections, or the amount of RAM available. Try running the query on a cluster with no additional workload.
RESEGMENTED_MANY_ROWS

This event type is typically related to a specific type of query, which you may need to adjust.

Do projections need to be segmented in a different way to allow for join locality? Can you rewrite the query to filter out more rows at storage access time? (Typically, Vertica does so automatically through predicate pushdown.) Review your explain plan.
RLE_OVERRIDDEN

The average run counts are not large enough for Run Length Encoding (RLE). This event occurs with queries where the filtered results for certain columns do not work with RLE because cardinality is less than 10.

Review and rewrite your query, if necessary.
PREDICATE OUTSIDE HISTOGRAM

A predicate value you are trying to match does not exist in a set of possible values for a specific column. For example, you try to match a VARCHAR value WHERE mystring = "ABC<newline>". In this case, the newline character throws off the predicate matching optimizations.

Review and rewrite your query, if necessary.

The following table lists more critical event types, and recommends actions you can take to resolve issues.

Event Type Description Action
WOS_SPILL

WOS ran out of memory, and began spilling to ROS.

Try one or more of the following:

  • Adjust the load process to run bigger batches less frequently.
  • Give the WOS resource pool more memory.
  • Increase how often the Tuple Mover moveout and mergeout processes run for high ingest rates. (Adjustment of the moveout process helps avoid ROS issues.)
NO HISTOGRAM 

Indicates a table does not have an updated column histogram.

Running the function ANALYZE_STATISTICS most often corrects this issue.
MEMORY LIMIT HIT 

Indicates query complexity or, possibly, lack of available system memory.

Consider adjusting the MAXMEMORYSIZE and PLANNEDCONCURRENCY resource pools so that the optimizer has sufficient memory. On a heavily used system, this event may occur more frequently.
DELETE WITH NON OPTIMIZED PROJECTION

One or more projections do not have your delete filter column in their sort order,causing Vertica difficulty identifying rows to mark as deleted.

Add the delete filter column to the end of every projection sort order for your target delete table.
JOIN_SPILLED

Vertica has spilled a join to disk. A join spill event slows down the subject query and all other queries as it consumes resources while using disk as virtual memory.

Try the following (in sequence):

  • Review the explain plan. The query could be too ambitious, for example, cross joining two large tables.
  • Consider adding the query to a lower priority pool to reduce impact on other queries.
  • Create projections that allow for a merge join instead of a hash join.
  • Adjust the PLANNEDCONCURRENCYresource pool so that queries have more memory to execute.

Review the Query Plan

A query plan is a sequence of step-like paths that the Verticaquery optimizer selects to access or alter information in your Verticadatabase. There are two ways to get information about the query plan: