verticapy.vDataColumn.topk#

vDataColumn.topk(k: int = -1, dropna: bool = True) → TableSample#

This function returns the k most frequently occurring elements in a column, along with their distribution expressed as percentages. It’s a useful tool for understanding the composition of your data and identifying the most prominent elements.

Parameters#

k: int, optional: Number of most occurent elements to return.
dropna: bool, optional: If set to True, NULL values are not considered during the computation.

Returns#

TableSample: result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)

Now, let’s calculate the top k values for a specific column.

data["x"].topk()

	...	count	percent
22	...	1	12.5
2	...	1	12.5
9	...	1	12.5
15	...	1	12.5
4	...	1	12.5
1	...	1	12.5
20	...	1	12.5
10	...	1	12.5

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.