verticapy.vDataFrame.mad#
- vDataFrame.mad(columns: str | list[str] | None = None, **agg_kwargs) TableSample #
Utilizes the
mad
(Median Absolute Deviation) aggregation method with the vDataFrame.MAD
measures the dispersion of data points around the median, and it is particularly valuable for assessing the robustness of data in the presence of outliers. When we aggregate the vDataFrame usingmad
, we gain insights into the variability and the degree to which data points deviate from the median.This is especially useful for datasets where we want to understand the spread of values while being resistant to the influence of extreme outliers. Calculating
mad
can involve robust statistical computations, making it a useful tool for outlier-robust analysis and data quality evaluation.Warning
To compute mad, VerticaPy needs to execute multiple queries. It necessitates, at a minimum, a query that includes a subquery to perform this type of aggregation. This complexity is the reason why calculating mad is typically slower than some other types of aggregations.
Parameters#
- columns: SQLColumns, optional
List of the vDataColumns names. If empty, all vDataColumns are used.
- **agg_kwargs
Any optional parameter to pass to the Aggregate function.
Returns#
- TableSample
result.
Examples#
For this example, we will use the following dataset:
import verticapy as vp data = vp.vDataFrame( { "x": [1, 2, 4, 9, 10, 15, 20, 22], "y": [1, 2, 1, 2, 1, 1, 2, 1], "z": [10, 12, 2, 1, 9, 8, 1, 3], } )
Now, let’s calculate the median absolute deviation for specific columns.
data.mad( columns = ["x", "y", "z"], )
mad "x" 6.5 "y" 0.0 "z" 4.0 Note
All the calculations are pushed to the database.
Hint
For more precise control, please refer to the
aggregate
method.