verticapy.vDataColumn.mad#

vDataColumn.mad() → bool | float | str | timedelta | datetime#

Utilizes the mad (Median Absolute Deviation) aggregation method with the vDataFrame. ‘MAD’ measures the dispersion of data points around the median, and it is particularly valuable for assessing the robustness of data in the presence of outliers. When we aggregate the vDataColumn using mad, we gain insights into the variability and the degree to which data points deviate from the median.

This is especially useful for datasets where we want to understand the spread of values while being resistant to the influence of extreme outliers. Calculating mad can involve robust statistical computations, making it a useful tool for outlier-robust analysis and data quality evaluation.

Warning

To compute mad, VerticaPy needs to execute multiple queries. It necessitates, at a minimum, a query that includes a subquery to perform this type of aggregation. This complexity is the reason why calculating mad is typically slower than some other types of aggregations.

Returns#

PythonScalar: mad

Examples#

For this example, let’s generate a dataset and calculate the median absolute deviation of a column:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)


data["x"].mad()
Out[3]: 6.5

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.