verticapy.vDataColumn.median#

vDataColumn.median(approx: bool = True) → bool | float | str | timedelta | datetime#

Aggregates the vDataFrame using the MEDIAN or APPROX_MEDIAN aggregation, which calculates the median value for the specified columns. The median is a robust measure of central tendency and helps in understanding the distribution of data, especially in the presence of outliers.

Warning

When you set approx to True, the approximate median is computed, which is significantly faster than the exact calculation. However, be cautious when setting approx to False, as it can significantly slow down the performance.

Parameters#

approx: bool, optional: If set to True, the approximate median is returned. By setting this parameter to False, the function’s performance can drastically decrease.

Returns#

PythonScalar: median

Examples#

For this example, let’s generate a dataset and calculate the median of a column:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)


data["x"].median(approx = True)
Out[3]: 9.5

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.