Loading...

verticapy.vDataFrame.median#

vDataFrame.median(columns: str | list[str] | None = None, approx: bool = True, **agg_kwargs) TableSample#

Aggregates the vDataFrame using the MEDIAN or APPROX_MEDIAN aggregation, which calculates the median value for the specified columns. The median is a robust measure of central tendency and helps in understanding the distribution of data, especially in the presence of outliers.

Warning

When you set approx to True, the approximate median is computed, which is significantly faster than the exact calculation. However, be cautious when setting approx to False, as it can significantly slow down the performance.

Parameters#

columns: SQLColumns, optional

List of the vDataColumns names. If empty, all numerical vDataColumns are used.

approx: bool, optional

If set to True, the approximate median is returned. By setting this parameter to False, the function`s performance can drastically decrease.

**agg_kwargs

Any optional parameter to pass to the Aggregate function.

Returns#

TableSample

result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)

Now, let’s calculate the median for specific columns.

data.median(
    columns = ["x", "y", "z"],
    approx = True,
)
approx_50%
"x"9.5
"y"1.0
"z"5.5

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.

See also

vDataFrame.max() : Maximum for particular columns.
vDataFrame.min() : Maximum for particular columns.
vDataColumn.mean() : Mean for a specific column.