verticapy.vDataFrame.mad#

vDataFrame.mad(columns: str | list[str] | None = None, **agg_kwargs) → TableSample#

Utilizes the mad (Median Absolute Deviation) aggregation method with the vDataFrame. MAD measures the dispersion of data points around the median, and it is particularly valuable for assessing the robustness of data in the presence of outliers. When we aggregate the vDataFrame using mad, we gain insights into the variability and the degree to which data points deviate from the median.

This is especially useful for datasets where we want to understand the spread of values while being resistant to the influence of extreme outliers. Calculating mad can involve robust statistical computations, making it a useful tool for outlier-robust analysis and data quality evaluation.

Warning

To compute mad, VerticaPy needs to execute multiple queries. It necessitates, at a minimum, a query that includes a subquery to perform this type of aggregation. This complexity is the reason why calculating mad is typically slower than some other types of aggregations.

Parameters#

columns: SQLColumns, optional: List of the vDataColumns names. If empty, all vDataColumns are used.
**agg_kwargs: Any optional parameter to pass to the Aggregate function.

Returns#

TableSample: result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)

Now, let’s calculate the median absolute deviation for specific columns.

data.mad(
    columns = ["x", "y", "z"],
)

	mad
"x"	6.5
"y"	0.0
"z"	4.0

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.