Identical Segmentation
To improve query performance when you join multiple tables, create projections that are identically segmented on the join keys. Identically-segmented projections allow the joins to occur locally on each node, thereby helping to reduce data movement across the network during query processing.
To determine if projections are identically-segmented on the query join keys, create a query plan with EXPLAIN
. If the query plan contains RESEGMENT
or BROADCAST
, the projections are not identically segmented.
The Vertica optimizer chooses a projection to supply rows for each table in a query. If the projections to be joined are segmented, the optimizer evaluates their segmentation against the query join expressions. It thereby determines whether the rows are placed on each node so it can join them without fetching data from another node.
Join Conditions for Identically Segmented Projections
A projection p
is segmented on join columns if all column references in p
’s segmentation expression are a subset of the columns in the join expression.
The following conditions must be true for two segmented projections p1
of table t1
and p2
of table t2
to participate in a join of t1
to t2
:
-
The join condition must have the following form:
t1.j1 = t2.j1 AND t1.j2 = t2.j2 AND ... t1.jN = t2.jN
-
The join columns must share the same base data type. For example:
- If
t1.j1
is an INTEGER,t2.j1
can be an INTEGER but it cannot be a FLOAT. - If
t1.j1
is a CHAR(10),t2.j1
can be any CHAR or VARCHAR (for example, CHAR(10), VARCHAR(10), VARCHAR(20)), butt2.j1
cannot be an INTEGER.
- If
- If
p1
is segmented by an expression on columns {t1.s1, t1.s2, ... t1.sN
}, each segmentation columnt1.sX
must be in the join column set {t1.jX
}. - If
p2
is segmented by an expression on columns {t2.s1, t2.s2, ... t2.sN
}, each segmentation columnt2.sX
must be in the join column set {t2.jX
}. -
The segmentation expressions of
p1
andp2
must be structurally equivalent. For example:- If
p1
isSEGMENTED BY hash(t1.x)
andp2
isSEGMENTED BY hash(t2.x)
,p1
andp2
are identically segmented. - If
p1
isSEGMENTED BY hash(t1.x)
andp2
isSEGMENTED BY hash(t2.x + 1)
,p1
andp2
are not identically segmented.
- If
p1
andp2
must have the same segment count.- The assignment of segments to nodes must match. For example, if
p1
andp2
use anOFFSET
clause, their offsets must match. -
If Vertica finds projections for
t1
andt2
that are not identically segmented, the data is redistributed across the network during query run time, as necessary.If you create custom designs, try to use segmented projections for joins whenever possible.
Examples
The following statements create two tables and specify to create identical segments:
=> CREATE TABLE t1 (id INT, x1 INT, y1 INT) SEGMENTED BY HASH(id, x1) ALL NODES; => CREATE TABLE t2 (id INT, x1 INT, y1 INT) SEGMENTED BY HASH(id, x1) ALL NODES;
Given this design, the join conditions in the following queries can leverage identical segmentation:
=> SELECT * FROM t1 JOIN t2 ON t1.id = t2.id; => SELECT * FROM t1 JOIN t2 ON t1.id = t2.id AND t1.x1 = t2.x1;
Conversely, the join conditions in the following queries require resegmentation:
=> SELECT * FROM t1 JOIN t2 ON t1.x1 = t2.x1; => SELECT * FROM t1 JOIN t2 ON t1.id = t2.x1;