verticapy.vDataColumn.nunique#

vDataColumn.nunique(approx: bool = True) → int#

When aggregating the vDataFrame using nunique (cardinality), VerticaPy employs the COUNT DISTINCT function to determine the number of unique values in particular columns. It also offers the option to use APPROXIMATE_COUNT_DISTINCT, a more efficient approximation method for calculating cardinality.

Hint

This flexibility allows you to optimize the computation based on your specific requirements, keeping in mind that using APPROXIMATE_COUNT_DISTINCT can significantly improve performance when cardinality estimation is sufficient for your analysis.

Important

To calculate the exact cardinality of a column, you should set the parameter approx to False. This will ensure that the cardinality is computed accurately rather than using the approximate method.

Parameters#

approx: bool, optional: If set to True, the approximate cardinality is returned. By setting this parameter to False, the function’s performance can drastically decrease.

Returns#

int: vDataColumn cardinality (or approximate cardinality).

Examples#

For this example, let’s generate a dataset and calculate the cardinality of a column:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)


data["y"].nunique()
Out[3]: 2.0

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.