Purging Deleted Data
In Vertica, delete operations do not remove rows from physical storage. DELETE marks rows as deleted, as does UPDATE, which combines delete and insert operations. In both cases, Vertica retains discarded rows as historical data, which remains accessible to historical queries until it is purged.
The cost of retaining historical data is twofold:
- Disk space is allocated to deleted rows and delete markers.
- Typical (non-historical) queries must read and skip over deleted data, which can impact performance.
A purge operation permanently removes historical data from physical storage and frees disk space for reuse. Only historical data that precedes the Ancient History Mark (AHM) is eligible to be purged.
You can purge data in two ways:
In both cases, Vertica purges all historical data up to and including the AHM epoch and resets the AHM. See Epochs for additional information about how Vertica uses epochs.
Large delete and purge operations can take a long time to complete, so use them sparingly. If your application requires deleting data on a regular basis, such as by month or year, consider designing tables that take advantage of table partitioning. If partitioning is not suitable, consider rebuilding the table.