verticapy.vDataFrame.median#
- vDataFrame.median(columns: str | list[str] | None = None, approx: bool = True, **agg_kwargs) TableSample #
Aggregates the vDataFrame using the
MEDIAN
orAPPROX_MEDIAN
aggregation, which calculates the median value for the specified columns. The median is a robust measure of central tendency and helps in understanding the distribution of data, especially in the presence of outliers.Warning
When you set
approx
to True, the approximate median is computed, which is significantly faster than the exact calculation. However, be cautious when settingapprox
to False, as it can significantly slow down the performance.Parameters#
- columns: SQLColumns, optional
List of the vDataColumns names. If empty, all numerical vDataColumns are used.
- approx: bool, optional
If set to True, the approximate median is returned. By setting this parameter to False, the function`s performance can drastically decrease.
- **agg_kwargs
Any optional parameter to pass to the Aggregate function.
Returns#
- TableSample
result.
Examples#
For this example, we will use the following dataset:
import verticapy as vp data = vp.vDataFrame( { "x": [1, 2, 4, 9, 10, 15, 20, 22], "y": [1, 2, 1, 2, 1, 1, 2, 1], "z": [10, 12, 2, 1, 9, 8, 1, 3], } )
Now, let’s calculate the median for specific columns.
data.median( columns = ["x", "y", "z"], approx = True, )
approx_50% "x" 9.5 "y" 1.0 "z" 5.5 Note
All the calculations are pushed to the database.
Hint
For more precise control, please refer to the
aggregate
method.