Calculating the Database Size

You can use your Vertica software until your columnar data reaches the maximum raw data size that the license agreement provides. This section describes when data is monitored, what data is included in the estimate, and the general methodology used to produce an estimate. For more information about monitoring for data size, see Monitoring Database Size for License Compliance.

How Vertica Estimates Raw Data Size

Vertica uses statistical sampling to calculate an accurate estimate of the raw data size of the database. In this context, raw data means the uncompressed data stored in a single Vertica database. For the purpose of license size audit and enforcement, Vertica evaluates the raw data size as if the data had been exported from the database in text format, rather than as compressed data.

Vertica conducts your database size audit using statistical sampling. This method allows Vertica to estimate the size of the database without significantly affecting database performance. The trade-off between accuracy and impact on performance is a small margin of error, inherent in statistical sampling. Reports on your database size include the margin of error, so you can assess the accuracy of the estimate. To learn more about simple random sampling, see Simple Random Sampling.

Excluding Data From Raw Data Size Estimate

Not all data in the Vertica database is evaluated as part of the raw data size. Specifically, Vertica excludes the following data:

Evaluating Data Type Footprint Size

Vertica treats the data sampled for the estimate as if it had been exported from the database in text format (such as printed from vsql). This means that Vertica evaluates the data type footprint sizes as follows:

Using AUDIT to Estimate Database Size

To supply a more accurate database size estimate than statistical sampling can provide, use the AUDIT function to perform a full audit. This function has parameters to set both the error_tolerance and confidence_level. Using one or both of these parameters increases or decreases the function's performance impact.

For instance, lowering the error_tolerance to zero (0) and raising the confidence_level to 100, provides the most accurate size estimate, and increases the performance impact of calling the AUDIT function. During a detailed, low error-tolerant audit, all of the data in the database is dumped to a raw format to calculate its size. Since performing a stringent audit can significantly affect database performance, never perform a full audit of a production database. See AUDIT for details.

Note: Unlike estimating raw data size using statistical sampling, a full audit performs SQL queries on the full database contents, including the contents of the WOS.