Watch those Delete Vectors!

Posted July 30, 2019 by Bryan Herger, Vertica Big Data Solution Architect at Micro Focus

Digital image of a trash can made of data points on blue background

Vertica is very good at ingesting data, compressing it, and querying at high speed. The trade-off here is that the data is stored in large block files called ROS containers. These containers can grow to large sizes, sometimes over 10 GB, and this makes it impractical to decompress and edit the files during updates and deletes.

As a trade-off to make changes work in a timely fashion, Vertica records the deletes in a view called a delete vector. When you query a table with deleted or changed data, Vertica reads the ROS containers for the table, replays the delete vector on the result set, and returns the filtered set.

This gets rather inefficient as delete vectors grow. Because rewriting large ROS containers is an intensive operation, Vertica doesn’t usually do it automatically; instead, you should monitor the V_MONITOR.DELETE_VECTORS table for delete activity, especially if delete and update are regular operations.

Some workarounds can help:

  • Lower the threshold where Vertica auto-purges tables. This is 20% by default. The down side is that purge will be triggered at arbitrary times and could affect system performance.
  • Schedule a manual purge or mergeout to merge to update the ROS containers during idle time.
  • Use partitioning to delete older data. Partitions map to ROS containers, so dropping a partition is much more efficient, as it simply deletes the ROS containers for the partition. This can be much more efficient for managing data by date, for example.
  • Use a staging table to preserve or merge records, then swap tables with ALTER TABLE oldTable, newTable RENAME TO newTable, oldTable; then drop the old table. (Be sure to preserve projections in the new staging table! For example, CREATE TABLE newTable LIKE oldTable INCLUDING PROJECTIONS;)

Purging deleted data or dropping partitions will also reduce the size of the ROS containers on disk, so you could reclaim significant amounts of disk space too.

Here’s an example showing how delete is handled:

dbadmin=> CREATE TABLE delete_test (c INT);
CREATE TABLE
dbadmin=> INSERT INTO delete_test SELECT 1;
OUTPUT
--------
1
(1 row)
dbadmin=> INSERT INTO delete_test SELECT 2;
OUTPUT
--------
1
(1 row)
dbadmin=> INSERT INTO delete_test SELECT 3;
OUTPUT
--------
1
(1 row)

If I run “SELECT * FROM delete_test”, I would get 3 rows. This is how many rows are stored:

dbadmin=> select anchor_table_name, ros_row_count, ros_used_bytes from projection_storage where anchor_table_name = 'delete_test';
anchor_table_name | ros_row_count | ros_used_bytes
-------------------+---------------+----------------
delete_test | 3 | 53
(1 row)
dbadmin=> delete from delete_test where c = 2;
OUTPUT
--------
1
(1 row)

If I run “SELECT * FROM delete_test” now, I would get 2 rows. However, this is how many rows are stored:

dbadmin=> select anchor_table_name, ros_row_count, ros_used_bytes from projection_storage where anchor_table_name = 'delete_test';
anchor_table_name | ros_row_count | ros_used_bytes
-------------------+---------------+----------------
delete_test | 3 | 53
(1 row)

Wait! All three rows are still there! This is because Vertica stored the delete as a delete vector:

dbadmin=> SELECT projection_name, deleted_row_count FROM delete_vectors where projection_name like 'delete_test%';
projection_name | deleted_row_count
-------------------+-------------------
delete_test_super | 1
(1 row)

This delete vector is used to mask the deleted record. To get rid of the deleted record from ROS permanently, we purge the table:

dbadmin=> select PURGE_TABLE('delete_test');
PURGE_TABLE
-------------------------------------------------------------------------------------------
Task: purge operation
(Table: public.delete_test) (Projection: public.delete_test_super)
(1 row)

Now the delete vector and row are gone:

dbadmin=> SELECT projection_name, deleted_row_count FROM delete_vectors where projection_name like 'delete_test%';
projection_name | deleted_row_count
-----------------+-------------------
(0 rows)
dbadmin=> select anchor_table_name, ros_row_count, ros_used_bytes from projection_storage where anchor_table_name = 'delete_test';
anchor_table_name | ros_row_count | ros_used_bytes
-------------------+---------------+----------------
delete_test | 2 | 52
(1 row)

Helpful Links:

https://www.vertica.com/docs/latest/HTML/Content/Authoring/SQLReferenceManual/SystemTables/MONITOR/DELETE_VECTORS.htm
https://www.vertica.com/docs/latest/HTML/Content/Authoring/AdministratorsGuide/BulkDeletingPurging/PurgingDeletedData.htm

Have fun!