Determining When Statistics Were Last Updated
System table PROJECTION_COLUMNS
returns statistics about projection columns, including the type of statistics and when they were last updated.
For example, the following sample schema defines a table named trades, which groups the highly-correlated columns bid
and ask
and stores the stock
column separately:
=> CREATE TABLE trades (stock CHAR(5), bid INT, ask INT); => CREATE PROJECTION trades_p ( stock ENCODING RLE, GROUPED(bid ENCODING DELTAVAL, ask)) AS (SELECT * FROM trades) ORDER BY stock, bid; => INSERT INTO trades VALUES('acme', 10, 20); => COMMIT;
Query the PROJECTION_COLUMNS
table for table trades
:
=> \x Expanded display is on. => SELECT * FROM PROJECTION_COLUMNS WHERE table_name = 'trades';
The statistics_type
column returns NONE
for all three columns in the trades
table. The statistics_updated_timestamp
field is empty because statistics have not yet been run on this table.
-[ RECORD 1 ]----------------+--------------------
projection_id | 45035996273718838
projection_name | trades_p
projection_column_name | stock
column_position | 0
sort_position | 0
column_id | 45035996273718840
data_type | char(5)
encoding_type | RLE
access_rank | 0
group_id | 0
table_schema | public
table_id | 45035996273718836
table_name
| trades
table_column_id | 45035996273718836-1
table_column_name | stock
statistics_type | NONE
statistics_updated_timestamp |
is_expression | f
is_aggregate | f
partition_by_position |
order_by_type |
column_expression |
-[ RECORD 2 ]----------------+--------------------
projection_id | 45035996273718838
projection_name | trades_p
projection_column_name | bid
column_position | 1
sort_position | 1
column_id | 45035996273718842
data_type | int
encoding_type | DELTAVAL
access_rank | 0
group_id | 45035996273718844
table_schema | public
table_id | 45035996273718836
table_name | trades
table_column_id | 45035996273718836-2
table_column_name | bid
statistics_type | NONE
statistics_updated_timestamp |
is_expression | f
is_aggregate | f
partition_by_position |
order_by_type |
column_expression |
-[ RECORD 3 ]----------------+--------------------
projection_id | 45035996273718838
projection_name | trades_p
projection_column_name | ask
column_position | 2
sort_position |
column_id | 45035996273718846
data_type | int
encoding_type | AUTO
access_rank | 0
group_id | 45035996273718844
table_schema | public
table_id | 45035996273718836
table_name | trades
table_column_id | 45035996273718836-3
table_column_name | ask
statistics_type | NONE
statistics_updated_timestamp |
Now, run statistics on the stock
column:
=> SELECT ANALYZE_STATISTICS('trades.stock');
The system returns 0 for success:
-[ RECORD 1 ]------+-- ANALYZE_STATISTICS | 0
Now query PROJECTION_COLUMNS again:
=> SELECT * FROM PROJECTION_COLUMNS where table_name = 'trades';
This time, statistics_type
changes to FULL for the trades.stock
column (representing full statistics were run). The statistics_updated_timestamp
column returns the time the stock
columns statistics were updated. The timestamp for the bid
and ask
columns have not changed because statistics were not run on those columns. Also, the bid
and ask
columns changed from NONE
to ROWCOUNT
because Vertica automatically updates ROWCOUNT
statistics from time to time. The statistics are created by looking at existing catalog metadata.
-[ RECORD 1 ]----------------+------------------------------ projection_id | 45035996273718838 projection_name | trades_p projection_column_name | stock column_position | 0 sort_position | 0 column_id | 45035996273718840 data_type | char(5) encoding_type | RLE access_rank | 0 group_id | 0 table_schema | public table_id | 45035996273718836 table_name | trades table_column_id | 45035996273718836-1 table_column_name | stock statistics_type | FULL statistics_updated_timestamp | 2012-12-08 13:52:04.178294-05 is_expression | f is_aggregate | f partition_by_position | order_by_type | column_expression | -[ RECORD 2 ]----------------+------------------------------ projection_id | 45035996273718838 projection_name | trades_p projection_column_name | bid column_position | 1 sort_position | 1 column_id | 45035996273718842 data_type | int encoding_type | DELTAVAL access_rank | 0 group_id | 45035996273718844 table_schema | public table_id | 45035996273718836 table_name | trades table_column_id | 45035996273718836-2 table_column_name | bid statistics_type | ROWCOUNT statistics_updated_timestamp | 2012-12-08 13:51:20.016465-05 is_expression | f is_aggregate | f partition_by_position | order_by_type | column_expression | -[ RECORD 3 ]----------------+------------------------------ projection_id | 45035996273718838 projection_name | trades_p projection_column_name | ask column_position | 2 sort_position | column_id | 45035996273718846 data_type | int encoding_type | AUTO access_rank | 0 group_id | 45035996273718844 table_schema | public table_id | 45035996273718836 table_name | trades table_column_id | 45035996273718836-3 table_column_name | ask statistics_type | ROWCOUNT statistics_updated_timestamp | 2012-12-08 13:51:20.016475-05 is_expression | f is_aggregate | f partition_by_position | order_by_type | column_expression |
If you run statistics on the bid
column and then query this system table again, only RECORD 2 is updated:
=> SELECT ANALYZE_STATISTICS('trades.bid'); -[ RECORD 1 ]------+-- ANALYZE_STATISTICS | 0 => SELECT * FROM PROJECTION_COLUMNS where table_name = 'trades'; -[ RECORD 1 ]----------------+------------------------------ projection_id | 45035996273718838 projection_name | trades_p projection_column_name | stock column_position | 0 sort_position | 0 column_id | 45035996273718840 data_type | char(5) encoding_type | RLE access_rank | 0 group_id | 0 table_schema | public table_id | 45035996273718836 table_name | trades table_column_id | 45035996273718836-1 table_column_name | stock statistics_type | FULL statistics_updated_timestamp | 2012-12-08 13:52:04.178294-05 is_expression | f is_aggregate | f partition_by_position | order_by_type | column_expression | -[ RECORD 2 ]----------------+------------------------------ projection_id | 45035996273718838 projection_name | trades_p projection_column_name | bid column_position | 1 sort_position | 1 column_id | 45035996273718842 data_type | int encoding_type | DELTAVAL access_rank | 0 group_id | 45035996273718844 table_schema | public table_id | 45035996273718836 table_name | trades table_column_id | 45035996273718836-2 table_column_name | bid statistics_type | FULL statistics_updated_timestamp | 2012-12-08 13:53:23.438447-05 is_expression | f is_aggregate | f partition_by_position | order_by_type | column_expression | is_expression | f is_aggregate | f partition_by_position | order_by_type | column_expression | -[ RECORD 3 ]----------------+------------------------------ projection_id | 45035996273718838 projection_name | trades_p projection_column_name | ask column_position | 2 sort_position | column_id | 45035996273718846 data_type | int encoding_type | AUTO access_rank | 0 group_id | 45035996273718844 table_schema | public table_id | 45035996273718836 table_name | trades table_column_id | 45035996273718836-3 table_column_name | ask statistics_type | ROWCOUNT statistics_updated_timestamp | 2012-12-08 13:51:20.016475-05 is_expression | f is_aggregate | f partition_by_position | order_by_type | column_expression |
You can query the statistics_updated_timestamp
column to see when columns were updated:
=> \x Expanded display is off. => SELECT ANALYZE_STATISTICS('trades'); ANALYZE_STATISTICS -------------------- 0 (1 row) => SELECT projection_column_name, statistics_type, statistics_updated_timestamp FROM PROJECTION_COLUMNS where table_name = 'trades'; projection_column_name | statistics_type | statistics_updated_timestamp ------------------------+-----------------+------------------------------- stock | FULL | 2012-12-08 13:54:27.428622-05 bid | FULL | 2012-12-08 13:54:27.428632-05 ask | FULL | 2012-12-08 13:54:27.428639-05 (3 rows)