Getting Data on Table Statistics
Vertica provides information about statistics for a given table and its columns and partitions in two ways:
 The query optimizer notifies you about the availability of statistics to process a given query.
 System table
PROJECTION_COLUMNS
shows what types of statistics are available for the table columns, and when they were last updated.
Query Evaluation
During predicate selectivity estimation, the query optimizer can identify when histograms are not available or are out of date. If the value in the predicate is outside the histogram's maximum range, the statistics are stale. If no histograms are available, then no statistics are available to the plan.
When the optimizer detects stale or no statistics, such as when it encounters a column predicate for which it has no histogram, the optimizer performs the following actions:
 Displays and logs a message that you should run
ANALYZE_STATISTICS
.  Annotates
EXPLAIN
generated query plans with a statistics entry.  Ignores stale statistics when it generates a query plan. The optimizer uses other considerations to create a query plan, such as FKPK constraints.
For example, the following query plan fragment shows no statistics (histograms unavailable):
  + Outer > STORAGE ACCESS for fact [Cost: 604, Rows: 10K (NO STATISTICS)]
The following query plan fragment shows that the predicate falls outside the histogram range:
  + Outer > STORAGE ACCESS for fact [Cost: 35, Rows: 1 (PREDICATE VALUE OUTOFRANGE)]
Statistics Data in PROJECTION_COLUMNS
Two columns in system table PROJECTION_COLUMNS
show the status of each table column's statistics, as follows:
STATISTICS_TYPE
returns the type of statistics that are available for this column, one of the following:NONE
,ROWCOUNT
, orFULL
.STATISTICS_UPDATED_TIMESTAMP
returns the last time statistics were collected for this column.
For example, the following sample schema defines a table named trades, which groups the highlycorrelated columns bid
and ask
and stores the stock
column separately:
=> CREATE TABLE trades (stock CHAR(5), bid INT, ask INT); => CREATE PROJECTION trades_p ( stock ENCODING RLE, GROUPED(bid ENCODING DELTAVAL, ask)) AS (SELECT * FROM trades) ORDER BY stock, bid; => INSERT INTO trades VALUES('acme', 10, 20); => COMMIT;
Query the PROJECTION_COLUMNS
table for table trades
:
=> SELECT table_name AS table, projection_name AS projection, table_column_name AS column, statistics_type, statistics_updated_timestamp AS last_updated FROM projection_columns WHERE table_name = 'trades'; table  projection  column  statistics_type  last_updated ++++ trades  trades_p_b0  stock  NONE  trades  trades_p_b0  bid  NONE  trades  trades_p_b0  ask  NONE  trades  trades_p_b1  stock  NONE  trades  trades_p_b1  bid  NONE  trades  trades_p_b1  ask  NONE  (6 rows)
The statistics_type
column returns NONE
for all columns in the trades
table, while statistics_updated_timestamp
is empty because statistics have not yet been collected on this table.
Now, run ANALYZE_STATISTICS
on the stock
column:
=> SELECT ANALYZE_STATISTICS ('public.trades', 'stock'); ANALYZE_STATISTICS  0 (1 row)
Now, when you query PROJECTION_COLUMNS
, it returns the following results:
=> SELECT table_name AS table, projection_name AS projection, table_column_name AS column, statistics_type, statistics_updated_timestamp AS last_updated FROM projection_columns WHERE table_name = 'trades'; table  projection  column  statistics_type  last_updated ++++ trades  trades_p_b0  stock  FULL  20190403 12:00:12.23156404 trades  trades_p_b0  bid  ROWCOUNT  20190403 12:00:12.23156404 trades  trades_p_b0  ask  ROWCOUNT  20190403 12:00:12.23156404 trades  trades_p_b1  stock  FULL  20190403 12:00:12.23156404 trades  trades_p_b1  bid  ROWCOUNT  20190403 12:00:12.23156404 trades  trades_p_b1  ask  ROWCOUNT  20190403 12:00:12.23156404 (6 rows)
This time, the query results contain several changes:
statistics_type


statistics_updated_timestamp

