Collecting Partition Statistics

ANALYZE_STATISTICS_PARTITION collects and aggregates data samples and storage information for a range of partitions in the specified table. Vertica writes the collected statistics to the database catalog.

If you specify multiple partitions, they must be continuous. Different collections can overlap—for example, given partition keys of 1 through 99, successive calls to ANALYZE_STATISTICS_PARTITION can collect statistics for partitions 1-60 and 50-99. Similarly, one collection of statistics can be a subset of an earlier collection. In all cases, Vertica stores only the most recent statistics for each partition range. Thus, given the previous example, Vertica uses only statistics from the second collection for partition keys 50 through 60.

Narrowing the Collection Scope

Like ANALYZE_STATISTICS, ANALYZE_STATISTICS_PARTITION lets you narrow scope of analysis to a subset of a table's columns. You can also control the size of the data sample that it collects. For details on these options, see Collecting Table Statistics.

Example

The following table stores sales data and is partitioned by order dates:

CREATE TABLE public.store_orders_fact
(
    product_key int,
    product_version int,
    store_key int,
    vendor_key int,
    employee_key int,
    order_number int,
    date_ordered date NOT NULL,
    date_shipped date NOT NULL,
    quantity_ordered int,
    shipper_name varchar(32)
);

ALTER TABLE public.store_orders_fact PARTITION BY date_ordered::DATE GROUP BY CALENDAR_HIERARCHY_DAY(date_ordered::DATE, 2, 2) REORGANIZE;

ALTER TABLE public.store_orders_fact ADD CONSTRAINT fk_store_orders_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE public.store_orders_fact ADD CONSTRAINT fk_store_orders_vendor FOREIGN KEY (vendor_key) references public.vendor_dimension (vendor_key);
ALTER TABLE public.store_orders_fact ADD CONSTRAINT fk_store_orders_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);

At the end of each business day you might call ANALYZE_STATISTICS_PARTITION and collect statistics on all data of the latest (today's) partition:

=> SELECT ANALYZE_STATISTICS_PARTITION('public.store_orders_fact', CURRENT_DATE::VARCHAR(10), CURRENT_DATE::VARCHAR(10));
 ANALYZE_STATISTICS_PARTITION
------------------------------
                            0
(1 row)

The function produces a set of fresh statistics for the most recent partition in store.store_sales_fact. If you query this table each morning on yesterday's sales, the optimizer uses these statistics to generate an optimized query plan:

=> EXPLAIN SELECT COUNT(*) FROM public.store_orders_fact WHERE date_ordered = CURRENT_DATE-1;

                                           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT COUNT(*) FROM public.store_orders_fact WHERE date_ordered = CURRENT_DATE-1;

 Access Path:
 +-GROUPBY NOTHING [Cost: 2, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(*)
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for store_orders_fact [Cost: 1, Rows: 222(PARTITION-LEVEL STATISTICS)] (PATH ID: 2)
 | |      Projection: public.store_orders_fact_v1_b1
 | |      Filter: (store_orders_fact.date_ordered = '2019-04-01'::date)
 | |      Execute on: All Nodes