Mergeout

Mergeout is the Tuple Mover process that consolidates ROS containers and purges deleted records. Over time, the number of ROS containers increases enough to affect performance. It is then necessary to merge some of the ROS containers to avoid performance degradation. At that point, the Tuple Mover performs an automatic mergeout, combining two or more ROS containers into a single container. You can think of this process as a way of defragmenting the ROS.

Vertica keeps data from different partitions separate on disk. The Tuple Mover adheres to this separation policy when it consolidates ROS containers. Tuple Move does not merge ROS containers from different partitions. When a partition is first created, it typically has frequent data loads and requires regular activity from the Tuple Mover. As a partition ages, it commonly transitions to a mostly read-only workload and requires much less activity.

The Tuple Mover has two different policies for managing these different partition workloads:

Partitions are not explicitly marked by the user as active or inactive; instead, the Tuple Mover uses the following algorithm to order the partitions from oldest to newest:

If you perform a manual mergeout using the DO_TM_TASK function, all partitions are consolidated into the smallest possible number of containers, regardless of the value of the ActivePartitionCount parameter.

Mergeout Strata Algorithm

The mergeout operation uses a strata-based algorithm to verify that each tuple is subjected to a mergeout operation a small, constant number of times, despite the process used to load the data. The mergeout operation uses this algorithm to choose which ROS containers to merge for non-partitioned tables and for active partitions in partitioned tables.

Vertica builds strata for each active partition and for projections anchored to non-partitoned tables. The number of strata, the size of each stratum, and the maximum number of ROS containers in a stratum is computed based on disk size, memory, and the number of columns in a projection.

Merging small ROS containers before merging larger ones provides the maximum benefit during the mergeout process. The algorithm begins at stratum 0 and moves upward. It checks to see if the number of ROS containers in a stratum has reached a value equal to or greater than the maximum ROS containers allowed per stratum. The default value is 32. If the algorithm finds that a stratum is full, it marks the projections and the stratum as eligible for mergeout.

Mergeout of Deletion Markers

When you delete data from the database, Vertica does not remove it. Instead, it marks the data as deleted. Using many DELETE statements to mark a small number of rows relative to the size of a table can result in creating many small containers, the delete vectors, to hold data marked for deletion. Each delete vector container consumes resources, so a large number of such containers can impact performance, especially during recovery.

After the Tuple Mover performs a mergeout, it looks for deletion marker containers that hold few entries. If such containers exist, the Tuple Mover merges them together into a single, larger container. This process helps lower the overhead of tracking deleted data by freeing resources used by multiple, individual containers. The Tuple Mover does not purge or otherwise affect the deleted data, but consolidates delete vectors for greater efficiency.

Note: You can see the number and size of the containers holding the deletion marks by viewing the V_MONITOR.DELETE_VECTORS system table.

Automatic and Manual Mergeout

Vertica periodically checks ROS storage containers to determine whether delete vectors are eligible for purge, as follows:

  1. Counts the number of 'aged-out' delete vectors in each container—that is, delete vectors that are as 'old' or older than the ancient history mark (AHM) epoch.
  2. Calculates the percentage of aged-out delete vectors relative to the total number of records in the same ROS container.
  3. If this percentage exceeds the threshold set by configuration parameter PurgeMergeoutPercent (by default, 20 percent), Vertica automatically performs a mergeout on the ROS container, and permanently removes all aged-out delete vectors from the ROS container.

You can also manually remove all aged-out delete vectors from ROS containers with two Vertica meta-functions:

Both functions remove all aged-out delete vectors from ROS containers, regardless of how many are in a given container.