Loading...

verticapy.vDataColumn.quantile#

vDataColumn.quantile(q: int | float | Decimal, approx: bool = True) bool | float | str | timedelta | datetime#

Aggregates the vDataColumn using a specified quantile. The quantile function is an indispensable tool for comprehending data distribution. By providing a quantile value as input, this aggregation method helps us identify the data point below which a certain percentage of the data falls. This can be pivotal for tasks like analyzing data distributions, assessing skewness, and determining essential percentiles such as medians or quartiles.

Warning

It’s important to note that the quantile aggregation operates in two distinct modes, allowing flexibility in computation. Depending on the approx parameter, it can use either APPROXIMATE_QUANTILE or QUANTILE methods to derive the final aggregation. The APPROXIMATE_QUANTILE method provides faster results by estimating the quantile values with an approximation technique, while QUANTILE calculates precise quantiles through rigorous computation. This choice empowers users to strike a balance between computational efficiency and the level of precision required for their specific data analysis tasks.

Parameters#

q: PythonNumber

A float between 0 and 1 that represents the quantile. For example: 0.25 represents Q1.

approx: bool, optional

If set to True, the approximate quantile is returned. By setting this parameter to False, the function’s performance can drastically decrease.

Returns#

PythonScalar

quantile (or approximate quantile).

Examples#

For this example, let’s generate a dataset and calculate the approximate median of a column:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)


data["x"].quantile(q = 0.5, approx = True)
Out[3]: 9.5

Let’s compute the approximate last decile of a column.

data["x"].quantile(q = 0.9, approx = True)
Out[4]: 20.6

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.

See also

vDataColumn.aggregate() : Aggregations for a specific column.
vDataFrame.aggregate() : Aggregates for particular columns.