Optimizing for Merge Join

Posted October 22, 2014 by NorbertK

High angle view of Beijing Guomao.

In an earlier post, join operations were introduced followed by hash join operations. The other operator, merge join, may sometimes be needed in situations when a spill to disk occurs. In these situations, resources are being wasted. One approach may be optimizing for merge join. In this post, a query was optimized and tested for merge join using subqueries and specific projections.

Background

The purpose of optimizing for a merge join is to avoid situations where a join spills to disk. These events can be identified by looking at execution events for spill to disk (code>JOIN_SPILLED). A merge join will never spill to disk because the query will have been allocated memory that is sufficient for the matched values to be streamed through memory. It will always be more efficient than a hash join, but not necessarily faster. If the data set is small, it’s possible that a hash join can process faster. However, the hash join will always use more memory.

To process a merge join, it is assumed that the participating projections are sorted on the join key, or the subqueries are sorted on the join key. This allows matched values to be streamed through memory rather than built out as a table. In addition, the join must be an INNER JOIN.

The data sets involved in the join must be sorted for a merge join to be possible. This can be accomplished through a sorted projection, or sorted subquery. However, in the projection, having the join key as the first column in the ORDER BY may impact the sorting and encoding for other columns in the projection. In a subquery, the data would be sorted and then passed to perform the join.

With both data sets sorted, the matched values are streamed through memory. As the sides are compared and a match is found, a tuple is created. After moving through the sorted data and exhausting all possibilities on one side of the join, no further matching is required.

To further optimize a merge join, if there is an equality predicate (direct comparison) in the query, that equality predicate will be applied first if it follows a join key. This optimization, predicate pushdown, is also possible in a hash join. The goal is to apply the equality predicate on the involved tables to pass less data into the join.

For a local optimization, one of the involved projections has to be replicated on all nodes or the projections have to be identically segmented. With a replicated projection, a full copy of the projection is available on each node, and each record will find a match in the associated segmented projection. This will allow the merge join to process locally.

Test

This test will use the VMart example database and join the online_sales_fact table to online_page_dimension using online_page_key.

The superprojections were replaced through a comprehensive design from Database Designer. The projections deployed by Database Designer are not sorted on the join key.
SELECT *
FROM online_sales.online_sales_fact osf
JOIN online_sales.online_page_dimension opd
ON osf.online_page_key = opd.online_page_key;

The query plan with base projections looked like:
Access Path:
+-JOIN HASH [Cost: 44K, Rows: 5M] (PATH ID: 1)
| Join Cond: (osf.online_page_key = opd.online_page_key)
| Materialize at Output: osf.sale_date_key, osf.ship_date_key, osf.product_key, osf.product_version, osf.customer_key, osf.call_center_key, osf.shipping_key, osf.warehouse_key, osf.promotion_key, osf.pos_transaction_number, osf.sales_quantity, osf.sales_dollar_amount, osf.ship_dollar_amount, osf.net_dollar_amount, osf.cost_dollar_amount, osf.gross_profit_dollar_amount, osf.transaction_type
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for osf [Cost: 4K, Rows: 5M] (PATH ID: 2)
| | Projection: online_sales.online_sales_fact_DBD_3_seg_os_base_b0
| | Materialize: osf.online_page_key
| | Execute on: All Nodes
| | Runtime Filter: (SIP1(HashJoin): osf.online_page_key)
| +-- Inner -> STORAGE ACCESS for opd [Cost: 102, Rows: 1K] (PATH ID: 3)
| | Projection: online_sales.online_page_dimension_DBD_1_rep_os_base_node0002
| | Materialize: opd.page_type, opd.page_description, opd.online_page_key, opd.start_date, opd.end_date, opd.page_number
| | Execute on: All Nodes
The test platform is a 3 node VM. Statistics are updated after projection creation. Output is directed to /dev/null, and timing is enabled. Only the All rows formatted time is recorded.

Quick & Dirty

The quick & dirty approach would be to sort the fact and dimension table on the join key in subqueries:
SELECT *
FROM (SELECT *
FROM online_sales.online_sales_fact
ORDER BY online_page_key) osf
JOIN (SELECT *
FROM online_sales.online_page_dimension
ORDER BY online_page_key) opd
ON osf.online_page_key = opd.online_page_key;
There are additional steps to complete the sort, but Vertica will be able to perform a merge join:

Access Path:
+-JOIN MERGEJOIN(inputs presorted) [Cost: 210K, Rows: 5M] (PATH ID: 1)
| Join Cond: (osf.online_page_key = opd.online_page_key)
| Execute on: All Nodes
| +-- Outer -> SELECT [Cost: 203K, Rows: 5M] (PATH ID: 2)
| | Execute on: All Nodes
| | +---> SORT [Cost: 203K, Rows: 5M] (PATH ID: 3)
| | | Order: online_sales_fact.online_page_key ASC
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP1(MergeJoin): osf.online_page_key)
| | | +---> STORAGE ACCESS for online_sales_fact [Cost: 52K, Rows: 5M] (PATH ID: 4)
| | | | Projection: online_sales.online_sales_fact_DBD_3_seg_os_base_b0
| | | | Materialize: online_sales_fact.transaction_type, online_sales_fact.product_version, online_sales_fact.sales_quantity, online_sales_fact.ship_dollar_amount, online_sales_fact.shipping_key, online_sales_fact.pos_transaction_number, online_sales_fact.sale_date_key, online_sales_fact.ship_date_key, online_sales_fact.product_key, online_sales_fact.customer_key, online_sales_fact.call_center_key, online_sales_fact.online_page_key, online_sales_fact.warehouse_key, online_sales_fact.promotion_key, online_sales_fact.sales_dollar_amount, online_sales_fact.net_dollar_amount, online_sales_fact.cost_dollar_amount, online_sales_fact.gross_profit_dollar_amount
| | | | Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 171, Rows: 1K] (PATH ID: 5)
| | Execute on: All Nodes
| | +---> SORT [Cost: 171, Rows: 1K] (PATH ID: 6)
| | | Order: online_page_dimension.online_page_key ASC
Specific Projection

Typically, when optimizing for a merge join, projections should have the join key be the first sorted column. Vertica will also perform the merge join if the join key is second in the sort order, following the column used in the equality predicate 1.

For this specific projection, the Database Designer was used with the base example query. The proposed projections would both be sorted on the join key (online_page_key). The fact table would be segmented, while the dimension table would be unsegmented.

The plan with a specific projection looked like:
Access Path:
+-JOIN HASH [Cost: 44K, Rows: 5M] (PATH ID: 1)
| Join Cond: (osf.online_page_key = opd.online_page_key)
| Materialize at Output: osf.sale_date_key, osf.ship_date_key, osf.product_key, osf.product_version, osf.customer_key, osf.call_center_key, osf.shipping_key, osf.warehouse_key, osf.promotion_key, osf.pos_transaction_number, osf.sales_quantity, osf.sales_dollar_amount, osf.ship_dollar_amount, osf.net_dollar_amount, osf.cost_dollar_amount, osf.gross_profit_dollar_amount, osf.transaction_type
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for osf [Cost: 4K, Rows: 5M] (PATH ID: 2)
| | Projection: online_sales.online_sales_fact_DBD_3_seg_os_base_b0
| | Materialize: osf.online_page_key
| | Execute on: All Nodes
| | Runtime Filter: (SIP1(HashJoin): osf.online_page_key)
| +-- Inner -> STORAGE ACCESS for opd [Cost: 102, Rows: 1K] (PATH ID: 3)
| | Projection: online_sales.online_page_dimension_DBD_1_rep_os_base_node0002
| | Materialize: opd.page_type, opd.page_description, opd.online_page_key, opd.start_date, opd.end_date, opd.page_number
| | Execute on: All Nodes
In this test, a hash join was used as the most perceived efficient option. To force a merge join, the optimizer was told to ignore the base projections:
Optimizer Directives
----------------------
AvoidUsingProjections=online_sales.online_page_dimension_DBD_1_rep_os_base_node0001,online_sales.online_page_dimension_DBD_1_rep_os_base_node0002,online_sales.online_page_dimension_DBD_1_rep_os_base_node0003,online_sales.online_sales_fact_DBD_3_seg_os_base_b0,online_sales.online_sales_fact_DBD_3_seg_os_base_b1
Which finally produces a plan using a merge join:
Access Path:
+-JOIN MERGEJOIN(inputs presorted) [Cost: 46K, Rows: 5M (1K RLE)] (PATH ID: 1) Inner (FILTER)
| Join Cond: (osf.online_page_key = opd.online_page_key)
| Materialize at Output: osf.sale_date_key, osf.ship_date_key, osf.product_key, osf.product_version, osf.customer_key, osf.call_center_key, osf.shipping_key, osf.warehouse_key, osf.promotion_key, osf.pos_transaction_number, osf.sales_quantity, osf.sales_dollar_amount, osf.ship_dollar_amount, osf.net_dollar_amount, osf.cost_dollar_amount, osf.gross_profit_dollar_amount, osf.transaction_type
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for osf [Cost: 17, Rows: 5M (1K RLE)] (PATH ID: 2)
| | Projection: online_sales.online_sales_fact_DBD_1_seg_os_merge_b0
| | Materialize: osf.online_page_key
| | Execute on: All Nodes
| | Runtime Filter: (SIP1(MergeJoin): osf.online_page_key)
| +-- Inner -> STORAGE ACCESS for opd [Cost: 110, Rows: 1K] (PATH ID: 3)
| | Projection: online_sales.online_page_dimension_DBD_2_rep_os_merge_node0002
| | Materialize: opd.online_page_key, opd.start_date, opd.end_date, opd.page_number, opd.page_description, opd.page_type
| | Execute on: All Nodes
Results

[table id=1 /]

In these tests, the specific projection improved performance by less than 3%. This figure would be more substantial if the entire data set wasn’t materialized. The base projection with sorted subqueries had the worst performance due to the additional steps required in sorting the data sets.

As mentioned in the background, Vertica can typically process hash joins quite fast and efficiently. In situations where a spill to disk occurs, optimization for merge join may be needed. If a query is spilling to disk often enough, a specific projection might be beneficial. However, exploring other options such as passing an ENABLE_JOIN_SPILL hint to the execution engine in the query may temporarily remedy such a situation.

Documentation

1 Optimizing for Merge Joins