DELETE and UPDATE Optimization
The process of optimizing DELETE
and UPDATE
queries is the same for both operations. Some simple steps can increase the query performance by tens to hundreds of times. The following sections describe several ways to improve projection design and improve DELETE
and UPDATE
queries to significantly increase DELETE
and UPDATE
performance.
Note: For large bulk deletion, Vertica recommends using Partitioned Tables where possible because they provide the best DELETE
performance and improve query performance.
Projection Column Requirements for Optimized Deletes
When all columns required by the DELETE
or UPDATE
predicate are present in a projection, the projection is optimized for DELETE
s and UPDATE
s. DELETE
and UPDATE
operations on such projections are significantly faster than on non-optimized projections.
For example, consider the following table and projections:
CREATE TABLE t (a INTEGER, b INTEGER, c INTEGER); CREATE PROJECTION p1 (a, b, c) AS SELECT * FROM t ORDER BY a; CREATE PROJECTION p2 (a, c) AS SELECT a, c FROM t ORDER BY c, a;
In the following query, both p1
and p2
are eligible for DELETE
and UPDATE
optimization because column a
is available:
DELETE from t WHERE a = 1;
In the following example, only projection p1
is eligible for DELETE
and UPDATE
optimization because the b column is not available in p2
:
DELETE from t WHERE b = 1;
Optimized Deletes in Subqueries
To be eligible for DELETE
optimization, all target table columns referenced in a DELETE
or UPDATE
statement's WHERE clause must be in the projection definition.
For example, the following simple schema has two tables and three projections:
CREATE TABLE tb1 (a INT, b INT, c INT, d INT); CREATE TABLE tb2 (g INT, h INT, i INT, j INT);
The first projection references all columns in tb1
and sorts on column a
:
CREATE PROJECTION tb1_p AS SELECT a, b, c, d FROM tb1 ORDER BY a;
The buddy projection references and sorts on column a
in tb1
:
CREATE PROJECTION tb1_p_2 AS SELECT a FROM tb1 ORDER BY a;
This projection references all columns in tb2
and sorts on column i
:
CREATE PROJECTION tb2_p AS SELECT g, h, i, j FROM tb2 ORDER BY i;
Consider the following DML statement, which references tb1.a
in its WHERE
clause. Since both projections on tb1
contain column a
, both are eligible for the optimized DELETE
:
DELETE FROM tb1 WHERE tb1.a IN (SELECT tb2.i FROM tb2);
Restrictions
Optimized DELETE
s are not supported under the following conditions:
-
With replicated projections if subqueries reference the target table. For example, the following syntax is not supported:
DELETE FROM tb1 WHERE tb1.a IN (SELECT e FROM tb2, tb2 WHERE tb2.e = tb1.e);
-
With subqueries that do not return multiple rows. For example, the following syntax is not supported:
DELETE FROM tb1 WHERE tb1.a = (SELECT k from tb2);
Projection Sort Order for Optimizing Deletes
Design your projections so that frequently-used DELETE
or UPDATE
predicate columns appear in the sort order of all projections for large DELETE
s and UPDATE
s.
For example, suppose most of the DELETE queries you perform on a projection look like the following:
DELETE from t where time_key < '1-1-2007'
To optimize the DELETEs, make time_key appear in the ORDER BY clause of all your projections. This schema design results in better performance of the DELETE operation.
In addition, add additional sort columns to the sort order such that each combination of the sort key values uniquely identifies a row or a small set of rows. For more information, see Choosing Sort Order: Best Practices. To analyze projections for sort order issues, use the EVALUATE_DELETE_PERFORMANCE function.