Vertica Analytics Platform Version 9.2.x Documentation

Best Practices for Statistics Collection

You should call ANALYZE_STATISTICS or ANALYZE_STATISTICS_PARTITION when one or more of following conditions are true:

  • Data is bulk loaded for the first time.
  • A new projection is refreshed.
  • The number of rows changes significantly.
  • A new column is added to the table.
  • Column minimum/maximum values change significantly.
  • New primary key values with referential integrity constraints are added . The primary key and foreign key tables should be re-analyzed.
  • Table size notably changes relative to other tables it is joined to—for example, a table that was 50 times larger than another table is now only five times larger.
  • A notable deviation in data distribution necessitates recalculating histograms—for example, an event causes abnormally high levels of trading for a particular stock.
  • The database is inactive for an extended period of time.

Overhead Considerations

Running ANALYZE_STATISTICS is an efficient but potentially long-running operation. You can run it concurrently with queries and loads in a production environment. However, the function can incur considerable overhead on system resources (CPU and memory), at the expense of queries and load operations. To minimize overhead, consider calling ANALYZE_STATISTICS_PARTITIONS on those partitions that are subject to significant activity—typically, the most recently loaded partitions, including the table's active partition. You can further narrow the scope of both functions by specifying a subset of the table columns—generally, those that are queried most often.

Related Tools

You can diagnose and resolve many statistics-related issues by calling ANALYZE_WORKLOAD, which returns tuning recommendations. If you update statistics and find that a query still performs poorly, run it through the Database Designer and choose incremental as the design type.