Partition Pruning

If a query predicate specifies a partitioning expression, the query optimizer evaluates the predicate against the ROS containers of the partitioned data. Each ROS container maintains the minimum and maximum values of its partition key data. The query optimizer uses this metadata to determine which ROS containers it needs to execute the query, and omits, or prunes, the remaining containers from the query plan. By minimizing the number of ROS containers that it must scan, the query optimizer enables faster execution of the query.

For example, a table might be partitioned by year as follows:

=> CREATE TABLE ... PARTITION BY EXTRACT(year FROM date);

Given this table definition, its projection data is partitioned into ROS containers according to year, one for each year—in this case, 2007, 2008, 2009.

The following query specifies the partition expression date:

=> SELECT ... WHERE date = '12-2-2009'; 

Given this query, the ROS containers that contain data for 2007 and 2008 fall outside the boundaries of the requested year (2009). The query optimizer prunes these containers from the query plan before the query executes:

Partition pruning example

Examples

Assume a table that is partitioned by time and will use queries that restrict data on time.

=> CREATE TABLE time ( tdate DATE NOT NULL, tnum INTEGER)
     PARTITION BY EXTRACT(year FROM tdate);
=> CREATE PROJECTION time_p (tdate, tnum) AS
=> SELECT * FROM time ORDER BY tdate, tnum UNSEGMENTED ALL NODES;

Projection sort order has no effect on partition pruning.

=> INSERT INTO time VALUES ('03/15/04' , 1);
=> INSERT INTO time VALUES ('03/15/05' , 2);
=> INSERT INTO time VALUES ('03/15/06' , 3);
=> INSERT INTO time VALUES ('03/15/06' , 4);

The data inserted in the previous series of commands are loaded into three ROS containers, one per year, as that is how the data is partitioned:

=> SELECT * FROM time ORDER BY tnum;
   tdate    | tnum
------------+------
 2004-03-15 |    1  --ROS1 (min 03/01/04, max 03/15/04)
 2005-03-15 |    2  --ROS2 (min 03/15/05, max 03/15/05)
 2006-03-15 |    3  --ROS3 (min 03/15/06, max 03/15/06)
 2006-03-15 |    4  --ROS3 (min 03/15/06, max 03/15/06)
(4 rows)

Here's what happens when you query the time table:

  • In this query, Vertica can omit container ROS2 because it is only looking for year 2004:

    => SELECT COUNT(*) FROM time WHERE tdate = '05/07/2004';
  • In the next query, Vertica can omit two containers, ROS1 and ROS3:

    => SELECT COUNT(*) FROM time WHERE tdate = '10/07/2005';
  • The following query has an additional predicate on the tnum column for which no minimum/maximum values are maintained. In addition, the use of logical operator OR is not supported, so no ROS elimination occurs:

    => SELECT COUNT(*) FROM time WHERE tdate = '05/07/2004' OR tnum = 7;