Loading...

verticapy.vDataFrame.nunique#

vDataFrame.nunique(columns: str | list[str] | None = None, approx: bool = True, **agg_kwargs) TableSample#

When aggregating the vDataFrame using nunique (cardinality), VerticaPy employs the COUNT DISTINCT function to determine the number of unique values in a particular column. It also offers the option to use APPROXIMATE_COUNT_DISTINCT, a more efficient approximation method for calculating cardinality.

Hint

This flexibility allows you to optimize the computation based on your specific requirements, keeping in mind that using APPROXIMATE_COUNT_DISTINCT can significantly improve performance when cardinality estimation is sufficient for your analysis.

Important

To calculate the exact cardinality of a column, you should set the parameter approx to False. This will ensure that the cardinality is computed accurately rather than using the approximate method.

Parameters#

columns: SQLColumns, optional

List of the vDataColumns names. If empty, all vDataColumns are used.

approx: bool, optional

If set to True, the approximate cardinality is returned. By setting this parameter to False, the function’s performance can drastically decrease.

**agg_kwargs

Any optional parameter to pass to the Aggregate function.

Returns#

TableSample

result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)

Now, let’s calculate the cardinality for specific columns.

data.nunique(
    columns = ["x", "y", "z"],
)
approx_unique
"x"8.0
"y"2.0
"z"7.0

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.

See also

vDataFrame.duplicated() : Duplicate Values for particular columns.
vDataColumn.nunique() : Cardinaility for a specific column.