Collecting Partition Statistics

ANALYZE_STATISTICS_PARTITION collects and aggregates data samples and storage information for a range of partitions in the specified table. Vertica writes the collected statistics to the database catalog.

Requirements and Restrictions

The following requirements and restrictions apply to running ANALYZE_STATISTICS_PARTITION on a table:

  • If you specify multiple partitions, they must be continuous. Different collections can overlap—for example, given partition keys of 1 through 99, successive calls to ANALYZE_STATISTICS_PARTITION can collect statistics for partitions 1-60 and 50-99. Similarly, one collection of statistics can be a subset of an earlier collection. In all cases, Vertica stores only the most recent statistics for each partition range. Thus, given the previous example, Vertica uses only statistics from the second collection for partition keys 50 through 60.
  • The table must be partitioned and cannot contain unpartitioned data.
  • The table partition expression must specify a single column. The following expressions are supported:
    • Expressions that specify only the column—that is, partition on all column values. For example:
      PARTITION BY ship_date GROUP BY CALENDAR_HIERARCHY_DAY(ship_date, 2, 2)
    • If the column is a DATE or TIMESTAMP/TIMESTAMPTZ, the partition expression can specify a supported date/time function that returns that column or any portion of it, such as month or year. For example, the following partition expression specifies to partition on the year portion of column order_date:
      PARTITION BY YEAR(order_date)
    • Expressions that perform addition or subtraction on the column. For example:
      PARTITION BY YEAR(order_date) -1
  • The table partition expression cannot coerce the specified column to another data type.
  • Vertica collects no statistics from the following projections:
    • Live aggregate and Top-K projections
    • Projections that are defined to include an SQL function within an expression

Supported Date/Time Functions

ANALYZE_STATISTICS_PARTITION can collect partition-level statistics on tables where the partition expression specifies one of the following date/time functions:

Narrowing the Collection Scope

Like ANALYZE_STATISTICS, ANALYZE_STATISTICS_PARTITION lets you narrow scope of analysis to a subset of a table's columns. You can also control the size of the data sample that it collects. For details on these options, see Collecting Table Statistics.

Example

The following table stores sales data and is partitioned by order dates:

CREATE TABLE public.store_orders_fact
(
    product_key int,
    product_version int,
    store_key int,
    vendor_key int,
    employee_key int,
    order_number int,
    date_ordered date NOT NULL,
    date_shipped date NOT NULL,
    quantity_ordered int,
    shipper_name varchar(32)
);

ALTER TABLE public.store_orders_fact PARTITION BY date_ordered::DATE GROUP BY CALENDAR_HIERARCHY_DAY(date_ordered::DATE, 2, 2) REORGANIZE;
ALTER TABLE public.store_orders_fact ADD CONSTRAINT fk_store_orders_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE public.store_orders_fact ADD CONSTRAINT fk_store_orders_vendor FOREIGN KEY (vendor_key) references public.vendor_dimension (vendor_key);
ALTER TABLE public.store_orders_fact ADD CONSTRAINT fk_store_orders_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);

At the end of each business day you might call ANALYZE_STATISTICS_PARTITION and collect statistics on all data of the latest (today's) partition:

=> SELECT ANALYZE_STATISTICS_PARTITION('public.store_orders_fact', CURRENT_DATE::VARCHAR(10), CURRENT_DATE::VARCHAR(10));
 ANALYZE_STATISTICS_PARTITION
------------------------------
                            0
(1 row)

The function produces a set of fresh statistics for the most recent partition in store.store_sales_fact. If you query this table each morning on yesterday's sales, the optimizer uses these statistics to generate an optimized query plan:

=> EXPLAIN SELECT COUNT(*) FROM public.store_orders_fact WHERE date_ordered = CURRENT_DATE-1;

                                           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT COUNT(*) FROM public.store_orders_fact WHERE date_ordered = CURRENT_DATE-1;

 Access Path:
 +-GROUPBY NOTHING [Cost: 2, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(*)
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for store_orders_fact [Cost: 1, Rows: 222(PARTITION-LEVEL STATISTICS)] (PATH ID: 2)
 | |      Projection: public.store_orders_fact_v1_b1
 | |      Filter: (store_orders_fact.date_ordered = '2019-04-01'::date)
 | |      Execute on: All Nodes