verticapy.vDataFrame.median#

vDataFrame.median(columns: str | list[str] | None = None, approx: bool = True, **agg_kwargs) → TableSample#

Aggregates the vDataFrame using the MEDIAN or APPROX_MEDIAN aggregation, which calculates the median value for the specified columns. The median is a robust measure of central tendency and helps in understanding the distribution of data, especially in the presence of outliers.

Warning

When you set approx to True, the approximate median is computed, which is significantly faster than the exact calculation. However, be cautious when setting approx to False, as it can significantly slow down the performance.

Parameters#

columns: SQLColumns, optional: List of the vDataColumns names. If empty, all numerical vDataColumns are used.
approx: bool, optional: If set to True, the approximate median is returned. By setting this parameter to False, the function`s performance can drastically decrease.
**agg_kwargs: Any optional parameter to pass to the Aggregate function.

Returns#

TableSample: result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)

Now, let’s calculate the median for specific columns.

data.median(
    columns = ["x", "y", "z"],
    approx = True,
)

	approx_50%
"x"	9.5
"y"	1.0
"z"	5.5

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.