This blog post was authored by Soniya Shah.
In Vertica 8.1.1, we introduce a performance improvement that reduces catalog memory usage for users with a large number of NULL values in tables. The improvement affects all string data types, including BINARY, VARBINARY, LONG VARBINARY, CHAR, VARCHAR and LONG VARCHAR. The improvement scales with the data type length and only affects columns with a length that is greater than 8 bytes.
Vertica stores min and max values on MiniRos objects. If the min or max is longer than 8 bytes, it is stored in a separate object, MinMaxObj, which is linked to the MiniRos. Previously, for MiniRos objects that represent exclusively NULL values, Vertica stored the min and max for the data type. This created performance issues if the columns were wide because there is a limit of 8 bytes on the MiniRos object. To work around this limitation, a MinMaxObj was created for wider columns. This could significantly bloat the catalog because an additional object was created. The catalog can also be bloated because those objects are large.
In Vertica 8.1.1, the storage container stores an empty string for min and max, with no additional objects or memory usage in the catalog, no matter what the length of the data type. A separate object is not created for MiniRos objects with exclusively NULL values. How is this possible?
For containers that have only NULL values, Vertica does not read the min and max values. It only tracks the number of NULLs and the number of rows. When you upgrade to Vertica 8.1.1, all MinMaxObj for strings that contain only NULL values are dropped. Then, the min and max on the MiniRos objects are updated to empty strings.
After you upgrade, restart the database to see memory improvement. You can also search vertica.log for this task, which is called CleanupMinirosStringMax.