Loading...

verticapy.vDataFrame.count_percent#

vDataFrame.count_percent(columns: str | list[str] | None = None, sort_result: bool = True, desc: bool = True, **agg_kwargs) TableSample#

Performs aggregation on the vDataFrame using a list of aggregate functions, including count and percent. The count function computes the number of non-missing (non-null) values within the dataset, providing us with an understanding of the data’s completeness.

On the other hand, the percent function calculates the percentage of non-missing values in relation to the total dataset size, offering insights into data integrity and completeness as a proportion.

Parameters#

columns: SQLColumns, optional

List of vDataColumn names. If empty, all vDataColumns are used.

sort_result: bool, optional

If set to True, the result is sorted.

desc: bool, optional

If set to True and sort_result is set to True, the result is sorted in descending order.

**agg_kwargs

Any optional parameter to pass to the Aggregate function.

Returns#

TableSample

result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)

Now, let’s calculate the count percentage for specific columns.

data.count_percent(
    columns = ["x", "y", "z"],
)
...
count
percent
"y"...7.087.5
"z"...6.075.0
"x"...4.050.0

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.

See also

vDataColumn.count() : Count for a specific column.
vDataFrame.count() : Count for particular columns.