Writing Bundle Metadata to the Catalog
Vertica internally stores physical table data in bundles together with metadata on the bundle contents. The query optimizer uses bundle metadata to look up and fetch the data it needs for a given query.
Vertica stores bundle metadata in the database catalog. This is especially beneficial in Eon mode: instead of fetching this metadata from remote (S3) storage, the optimizer can find it in the local catalog. This minimizes S3 reads, and facilitates faster query planning and overall execution.
Vertica writes bundle metadata to the catalog on two events:
- Any DML operation that changes table content, such as
INSERT
,UPDATE
, orCOPY
. Vertica writes bundle metadata to the catalog on the new or changed table data. DML operations have no effect on bundle metadata for existing table data. - Invocations of function
UPDATE_STORAGE_CATALOG
, as an argument to Vertica meta-functionDO_TM_TASK
, on existing data. You can narrow the scope of the catalog update operation to a specific projection or table. If no scope is specified, the operation is applied to the entire database.After upgrading to any Vertica version ≥ 9.2.1, you only need to call
UPDATE_STORAGE_CATALOG
once on existing data. Bundle metadata on all new or updated data is always written automatically to the catalog.
For example, the following DO_TM_TASK
call writes bundle metadata on all projections in table store.store_sales_fact
:
=> SELECT DO_TM_TASK ('update_storage_catalog', 'store.store_sales_fact'); do_tm_task ------------------------------------------------------------------------------- Task: update_storage_catalog (Table: store.store_sales_fact) (Projection: store.store_sales_fact_b0) (Table: store.store_sales_fact) (Projection: store.store_sales_fact_b1) (1 row)
Validating Bundle Metadata
You can query system table STORAGE_BUNDLE_INFO_STATISTICS
to determine which projections have invalid bundle metadata in the database catalog. For example, results from the following query show that the database catalog has invalid metadata for projections inventory_fact_b0
and inventory_fact_b1
:
=> SELECT node_name, projection_name, total_ros_count, ros_without_bundle_info_count FROM v_monitor.storage_bundle_info_statistics where ros_without_bundle_info_count > 0 ORDER BY projection_name, node_name; node_name | projection_name | total_ros_count | ros_without_bundle_info_count ------------------+-------------------+-----------------+------------------------------- v_vmart_node0001 | inventory_fact_b0 | 1 | 1 v_vmart_node0002 | inventory_fact_b0 | 1 | 1 v_vmart_node0003 | inventory_fact_b0 | 1 | 1 v_vmart_node0001 | inventory_fact_b1 | 1 | 1 v_vmart_node0002 | inventory_fact_b1 | 1 | 1 v_vmart_node0003 | inventory_fact_b1 | 1 | 1 (6 rows)
Best Practices
Updating the database catalog with UPDATE_STORAGE_CATALOG
is recommended only for Eon users. Enterprise users are unlikely to see measurable performance improvements from this update.
Calls to UPDATE_STORAGE_CATALOG
can incur considerable overhead, as the update process typically requires numerous and expensive S3 reads. Vertica advises against running this operation on the entire database. Instead, consider an incremental approach:
- Call
UPDATE_STORAGE_CATALOG
on a single large fact table. You can use performance metrics to estimate how much time updating other files will require. - Identify which tables are subject to frequent queries and prioritize catalog updates accordingly.