verticapy.vDataFrame.quantile#

Aggregates the vDataFrame using specified quantile. The quantile function is an indispensable tool for comprehending data distribution. By providing a quantile value as input, this aggregation method helps us identify the data point below which a certain percentage of the data falls. This can be pivotal for tasks like analyzing data distributions, assessing skewness, and determining essential percentiles such as medians or quartiles.

Warning

It’s important to note that the quantile aggregation operates in two distinct modes, allowing flexibility in computation. Depending on the approx parameter, it can use either APPROXIMATE_QUANTILE or QUANTILE methods to derive the final aggregation. The APPROXIMATE_QUANTILE method provides faster results by estimating the quantile values with an approximation technique, while QUANTILE calculates precise quantiles through rigorous computation. This choice empowers users to strike a balance between computational efficiency and the level of precision required for their specific data analysis tasks.

Parameters#

q: PythonNumber / ArrayLike: List of the different quantiles. They must be numbers between 0 and 1. For example [0.25, 0.75] will return Q1 and Q3.
columns: SQLColumns, optional: List of the vDataColumns names. If empty, all numerical vDataColumns are used.
approx: bool, optional: If set to True, the approximate quantile is returned. By setting this parameter to False, the function’s performance can drastically decrease.
**agg_kwargs: Any optional parameter to pass to the Aggregate function.

Returns#

TableSample: result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)

Now, let’s calculate some approximate quantiles for specific columns.

data.quantile(
    q = [0.1, 0.2, 0.5, 0.9],
    columns = ["x", "y", "z"],
    approx = True,
)

	...	approx_50%	approx_90%
"x"	...	9.5	20.6
"y"	...	1.0	2.0
"z"	...	5.5	10.6

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.