verticapy.vDataFrame.nunique#
- vDataFrame.nunique(columns: str | list[str] | None = None, approx: bool = True, **agg_kwargs) TableSample #
When aggregating the vDataFrame using nunique (cardinality), VerticaPy employs the COUNT DISTINCT function to determine the number of unique values in a particular column. It also offers the option to use APPROXIMATE_COUNT_DISTINCT, a more efficient approximation method for calculating cardinality.
Hint
This flexibility allows you to optimize the computation based on your specific requirements, keeping in mind that using APPROXIMATE_COUNT_DISTINCT can significantly improve performance when cardinality estimation is sufficient for your analysis.
Important
To calculate the exact cardinality of a column, you should set the parameter approx to False. This will ensure that the cardinality is computed accurately rather than using the approximate method.
Parameters#
- columns: SQLColumns, optional
List of the vDataColumns names. If empty, all vDataColumns are used.
- approx: bool, optional
If set to True, the approximate cardinality is returned. By setting this parameter to False, the function’s performance can drastically decrease.
- **agg_kwargs
Any optional parameter to pass to the Aggregate function.
Returns#
- TableSample
result.
Examples#
For this example, we will use the following dataset:
import verticapy as vp data = vp.vDataFrame( { "x": [1, 2, 4, 9, 10, 15, 20, 22], "y": [1, 2, 1, 2, 1, 1, 2, 1], "z": [10, 12, 2, 1, 9, 8, 1, 3], } )
Now, let’s calculate the cardinality for specific columns.
data.nunique( columns = ["x", "y", "z"], )
approx_unique "x" 8.0 "y" 2.0 "z" 7.0 Note
All the calculations are pushed to the database.
Hint
For more precise control, please refer to the
aggregate
method.See also
vDataFrame.
duplicated()
: Duplicate Values for particular columns.vDataColumn.
nunique()
: Cardinaility for a specific column.