Vertica Quick Tip: Analyzing Table Statistics by Column

Posted February 13, 2018 by Soniya Shah, Information Developer

This blog post was authored by Jim Knicely.

The ANALYZE_STATISTICS function collects and aggregates data samples and storage information from all nodes that store projections associated with the specified table. On a very large wide table it will take a significant amount of time to gather those statistics.

In many situations only a few columns in the table are modified as part of an ETL/ELT process. In these cases instead of re-analyzing all of the table’s columns, you can analyze just the updated columns, reducing the ANALYZE_STATISTICS function’s run-time!

Example: dbadmin=> SELECT COUNT(*) FROM big_varchar_table; COUNT ------------ 1000000000 (1 row) dbadmin=> \timing Timing is on. dbadmin=> select analyze_statistics('big_varchar_table', 100); -- Analyze entire table analyze_statistics -------------------- 0 (1 row) Time: First fetch (1 row): 196707.123 ms. All rows formatted: 196707.185 ms dbadmin=> UPDATE big_varchar_table SET the_varchar = 'DATA UPDATED...' WHERE pk BETWEEN 2098174 AND 4098174; OUTPUT --------- 2000001 (1 row) Time: First fetch (1 row): 1909.409 ms. All rows formatted: 1909.474 ms dbadmin=> select analyze_statistics('big_varchar_table', 'the_varchar', 100); -- Analyze only updated column analyze_statistics -------------------- 0 (1 row) Time: First fetch (1 row): 48348.982 ms. All rows formatted: 48349.028 ms By analyzing only the column that was updated I reduced the run-time of the ANALYZE_STATISTICS function from 3.3 minutes to 48 seconds!

Have Fun!