Loading...

verticapy.vDataColumn.topk

vDataColumn.topk(k: int = -1, dropna: bool = True) TableSample

This function returns the k most frequently occurring elements in a column, along with their distribution expressed as percentages. It’s a useful tool for understanding the composition of your data and identifying the most prominent elements.

Parameters

k: int, optional

Number of most occurent elements to return.

dropna: bool, optional

If set to True, NULL values are not considered during the computation.

Returns

TableSample

result.

Examples

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)

Now, let’s calculate the top k values for a specific column.

data["x"].topk()
...
count
percent
9...112.5
15...112.5
4...112.5
20...112.5
10...112.5
22...112.5
2...112.5
1...112.5

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.

See also

vDataColumn.nunique() : Cardinality for a specific column.
vDataFrame.nunique() : Cardinality for particular columns.