
The ANALYZE_STATISTICS function collects and aggregates data samples and storage information from all nodes that store projections associated with the specified table. On a very large wide table it will take a significant amount of time to gather those statistics.
In many situations only a few columns in the table are modified as part of an ETL/ELT process. In these cases instead of re-analyzing all of the table’s columns, you can analyze just the updated columns, reducing the ANALYZE_STATISTICS function’s run-time!
Example:
dbadmin=> SELECT COUNT(*) FROM big_varchar_table;
COUNT
------------
1000000000
(1 row)
dbadmin=> \timing
Timing is on.
dbadmin=> select analyze_statistics('big_varchar_table', 100); -- Analyze entire table
analyze_statistics
--------------------
0
(1 row)
Time: First fetch (1 row): 196707.123 ms. All rows formatted: 196707.185 ms
dbadmin=> UPDATE big_varchar_table SET the_varchar = 'DATA UPDATED...' WHERE pk BETWEEN 2098174 AND 4098174;
OUTPUT
---------
2000001
(1 row)
Time: First fetch (1 row): 1909.409 ms. All rows formatted: 1909.474 ms
dbadmin=> select analyze_statistics('big_varchar_table', 'the_varchar', 100); -- Analyze only updated column
analyze_statistics
--------------------
0
(1 row)
Time: First fetch (1 row): 48348.982 ms. All rows formatted: 48349.028 ms
By analyzing only the column that was updated I reduced the run-time of the ANALYZE_STATISTICS function from 3.3 minutes to 48 seconds!Have Fun!