verticapy.vDataColumn.nunique#
- vDataColumn.nunique(approx: bool = True) int #
When aggregating the vDataFrame using nunique (cardinality), VerticaPy employs the COUNT DISTINCT function to determine the number of unique values in particular columns. It also offers the option to use APPROXIMATE_COUNT_DISTINCT, a more efficient approximation method for calculating cardinality.
Hint
This flexibility allows you to optimize the computation based on your specific requirements, keeping in mind that using APPROXIMATE_COUNT_DISTINCT can significantly improve performance when cardinality estimation is sufficient for your analysis.
Important
To calculate the exact cardinality of a column, you should set the parameter approx to False. This will ensure that the cardinality is computed accurately rather than using the approximate method.
Parameters#
- approx: bool, optional
If set to True, the approximate cardinality is returned. By setting this parameter to False, the function’s performance can drastically decrease.
Returns#
- int
vDataColumn cardinality (or approximate cardinality).
Examples#
For this example, let’s generate a dataset and calculate the cardinality of a column:
import verticapy as vp data = vp.vDataFrame( { "x": [1, 2, 4, 9, 10, 15, 20, 22], "y": [1, 2, 1, 2, 1, 1, 2, 1], "z": [10, 12, 2, 1, 9, 8, 1, 3], } ) data["y"].nunique() Out[3]: 2.0
Note
All the calculations are pushed to the database.
Hint
For more precise control, please refer to the aggregate method.
See also
vDataColumn.
aggregate()
: Aggregations for a specific column.vDataFrame.
aggregate()
: Aggregates for particular columns.