Manually Choosing an ECS Strategy

When a subcluster has more nodes than there are shards in the database, the Vertica query optimizer uses the Elastic Crunch Scaling (ECS) feature to involve all of nodes in processing queries. See Using Elastic Crunch Scaling to Improve Query Performance for an overview of this feature.

In a subcluster that has more nodes than there are shards in the database, each shard has multiple subscribing nodes. The optimizer splits the responsibility for processing the shard's data among these subscribers. It has two strategies it can use to split up the data in the shards:

  • I/O-optimized strategy: The optimizer divides the list of ROS containers in the shard among the subscribing nodes. This strategy is the best to use when the nodes must fetch the data for the query from communal storage, rather than the depot. The nodes only fetch the ROS containers they need to resolve the query from communal storage, reducing the amount of data each needs to transfer from communal storage. Due to the arbitrary division of data among the nodes, this strategy does not support query optimizations that rely on data segmentation.
  • Compute-optimized strategy: The optimizer uses data segmentation to assign portions to each subscribing node. The nodes scan the entire shard, but use sub-segment filtering to find their assigned segments of the data. This strategy works best when most of the data for the query is in the depot, because the nodes must scan the entire contents of the shard. Because this strategy uses data segmentation, it supports optimizations such as local joins that the I/O-optimized strategy cannot.

The optimizer automatically chooses a strategy based on whether the query can take advantage of data segmentation. You can tell which strategy the optimizer chooses for a query by using EXPLAIN. The top of the plan explanation states whether ECS is preserving segmentation. For example, this simple query on a single table does not need to use segmentation (and therefore uses the I/O-optimized strategy):

=> EXPLAIN SELECT employee_last_name,
            employee_first_name,employee_age 
            FROM employee_dimension 
            ORDER BY employee_age DESC;
 
                                 QUERY PLAN      
--------------------------------------------------------------------------------
 ------------------------------ 
 QUERY PLAN DESCRIPTION: 
 The execution of this query involves non-participating nodes. 
 Crunch scaling strategy does not preserve data segmentation
 ------------------------------
. . . 

A more complex query using a JOIN results in ECS preserving data segmentation by using the compute-optimized strategy. The query plan tells you that segmentation is preserved:

=> EXPLAIN SELECT sales_quantity, sales_dollar_amount, transaction_type, cc_name
    FROM online_sales.online_sales_fact
    INNER JOIN online_sales.call_center_dimension 
    ON (online_sales.online_sales_fact.call_center_key 
        = online_sales.call_center_dimension.call_center_key
        AND sale_date_key = 156)
    ORDER BY sales_dollar_amount DESC;
                                                                                                            
                                     QUERY PLAN         
--------------------------------------------------------------------------------
 ------------------------------ 
 QUERY PLAN DESCRIPTION: 
 The execution of this query involves non-participating nodes.  
 Crunch scaling strategy preserves data segmentation
 ------------------------------
 . . . 

In most cases, the optimizer chooses the best strategy to use to split the data among the nodes subscribing to the same shard. However, you may find that some queries perform poorly. In these cases, you can manually choose which strategy to use, or even disable ECS entirely.

Setting the ECS Strategy for Individual Queries

You can force the optimizer to choose an ECS strategy (or disable ECS entirely) for a single query using the ECSMode hint. See Hints for more information about using hints in queries. The ECSMode hint takes a single argument that specifies the strategy to use:

  • AUTO—tells the optimizer to determine the strategy to use automatically. Only useful if you have set the ECS mode at the session level (see Setting the ECS Strategy for the Session or Database).
  • COMPUTE_OPTIMIZED—Force the use of the compute-optimized strategy.
  • IO_OPTIMIZED—force the use of the I/O-optimized strategy.
  • NONE—disable the use of ECS for this query. Only the participating nodes will take part in the query. The collaborating nodes will not take part.

The following example shows the query plan for a simple single-table query that is forced to use the compute-optimized strategy:

=> EXPLAIN SELECT /*+ECSMode(COMPUTE_OPTIMIZED)*/ employee_last_name,
             employee_first_name,employee_age 
             FROM employee_dimension 
             ORDER BY employee_age DESC;

                                   QUERY PLAN 
--------------------------------------------------------------------------------
 ------------------------------ 
 QUERY PLAN DESCRIPTION: 
 The execution of this query involves non-participating nodes.  
 Crunch scaling strategy preserves data segmentation
 ------------------------------
. . .

This example disable ECS in a six-node cluster in a three-shard database:

=> EXPLAIN SELECT /*+ECSMode(NONE)*/ employee_last_name,
             employee_first_name,employee_age 
             FROM employee_dimension 
             ORDER BY employee_age DESC;
 
                                     QUERY PLAN   
--------------------------------------------------------------------------------
 ------------------------------ 
 QUERY PLAN DESCRIPTION: 
 ------------------------------
 
 EXPLAIN SELECT /*+ECSMode(NONE)*/ employee_last_name,
             employee_first_name,employee_age 
             FROM employee_dimension 
             ORDER BY employee_age DESC;
 
 Access Path:
 +-SORT [Cost: 243, Rows: 10K] (PATH ID: 1)
 |  Order: employee_dimension.employee_age DESC
 |  Execute on: v_verticadb_node0007, v_verticadb_node0004, v_verticadb_node0005
 | +---> STORAGE ACCESS for employee_dimension [Cost: 71, Rows: 10K] (PATH ID: 2)
 | |      Projection: public.employee_dimension_DBD_8_seg_vmart_b0
 | |      Materialize: employee_dimension.employee_first_name, 
 | |      employee_dimension.employee_last_name, employee_dimension.employee_age
 | |      Execute on: v_verticadb_node0007, v_verticadb_node0004, 
 | |          v_verticadb_node0005
 . . .

Note that this query plan lacks the "this query involves non-participating nodes" statement, indicating that it does not use ECS. It also lists just three participating nodes. These are the nodes marked as participating in the V_CATALOG.SESSION_SUBSCRIPTIONS system table.

Setting the ECS Strategy for the Session or Database

You can use the ECSMode configuration parameter to set the ECS strategy for the current session. The values this parameter accepts are the same as the values the ECSMode hint accepts, with the exception of NONE. You cannot set the ECS mode to NONE at the session or database level, only at the individual query level.

The following example demonstrates using the configuration parameter to force a simple query to use the COMPUTE_OPTIMIZED strategy. It then sets the parameter back to its default value of AUTO:

=> EXPLAIN SELECT employee_first_name,employee_age 
    FROM employee_dimension ORDER BY employee_age DESC;
                                                                             
                                 QUERY PLAN 
--------------------------------------------------------------------------------
 ------------------------------ 
 QUERY PLAN DESCRIPTION: 
 The execution of this query involves non-participating nodes. 
 Crunch scaling strategy does not preserve data segmentation
 ------------------------------
. . .
 
=> ALTER SESSION SET ECSMode = 'COMPUTE_OPTIMIZED';
ALTER SESSION
=> EXPLAIN SELECT employee_first_name,employee_age 
    FROM employee_dimension ORDER BY employee_age DESC;
                                                                                
                                                     
                                  QUERY PLAN 
--------------------------------------------------------------------------------
 ------------------------------ 
 QUERY PLAN DESCRIPTION: 
 The execution of this query involves non-participating nodes. 
 Crunch scaling strategy preserves data segmentation
 ------------------------------
 . . .

dbadmin=> ALTER SESSION SET ECSMode = 'AUTO';
ALTER SESSION

Individual query hints override the session-level settings. This example sets the session default to use COMPUTE_OPTIMIZED, then restores the default behavior for a query by using the ECSMode hint with the value AUTO:

=> ALTER SESSION SET ECSMode = 'COMPUTE_OPTIMIZED';
ALTER SESSION
=> EXPLAIN SELECT /*+ECSMode(AUTO)*/ employee_first_name,employee_age 
   FROM employee_dimension ORDER BY employee_age DESC;
                                                                                
                                                                                 
                                  QUERY PLAN                  
-----------------------------------------------------------------------------
 ------------------------------ 
 QUERY PLAN DESCRIPTION: 
 The execution of this query involves non-participating nodes. 
 Crunch scaling strategy does not preserve data segmentation
 ------------------------------

Note that setting the ECSMode hint to AUTO let the optimizer pick the I/O-optimized strategy (which does not preserve segmentation) instead of using the compute-optimized strategy set at the session level.

You can also set the ECS strategy at the database level using ALTER DATABASE. However, doing so overrides the Vertica optimizer's settings for all users in all subclusters that use ECS. Before setting the ECS strategy at the database level, verify that the majority of the queries run by all users of the ECS-enabled subclusters must have the optimizer's default behavior overridden. If not, then use the session or query-level settings to override the optimizer for just the queries that benefit from a specific strategy.