Loading...

verticapy.vDataColumn.topk#

vDataColumn.topk(k: int = -1, dropna: bool = True) TableSample#

This function returns the k most frequently occurring elements in a column, along with their distribution expressed as percentages. It’s a useful tool for understanding the composition of your data and identifying the most prominent elements.

Parameters#

k: int, optional

Number of most occurent elements to return.

dropna: bool, optional

If set to True, NULL values are not considered during the computation.

Returns#

TableSample

result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)

Now, let’s calculate the top k values for a specific column.

data["x"].topk()
...
count
percent
22...112.5
2...112.5
9...112.5
15...112.5
4...112.5
1...112.5
20...112.5
10...112.5

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.

See also

vDataColumn.nunique() : Cardinality for a specific column.
vDataFrame.nunique() : Cardinality for particular columns.