verticapy.vDataColumn.topk#
- vDataColumn.topk(k: int = -1, dropna: bool = True) TableSample #
This function returns the k most frequently occurring elements in a column, along with their distribution expressed as percentages. It’s a useful tool for understanding the composition of your data and identifying the most prominent elements.
Parameters#
- k: int, optional
Number of most occurent elements to return.
- dropna: bool, optional
If set to True, NULL values are not considered during the computation.
Returns#
- TableSample
result.
Examples#
For this example, we will use the following dataset:
import verticapy as vp data = vp.vDataFrame( { "x": [1, 2, 4, 9, 10, 15, 20, 22], "y": [1, 2, 1, 2, 1, 1, 2, 1], "z": [10, 12, 2, 1, 9, 8, 1, 3], } )
Now, let’s calculate the top k values for a specific column.
data["x"].topk()
... count percent 22 ... 1 12.5 2 ... 1 12.5 9 ... 1 12.5 15 ... 1 12.5 4 ... 1 12.5 1 ... 1 12.5 20 ... 1 12.5 10 ... 1 12.5 Note
All the calculations are pushed to the database.
Hint
For more precise control, please refer to the
aggregate
method.