Loading...

verticapy.vDataFrame.sem#

vDataFrame.sem(columns: str | list[str] | None = None, **agg_kwargs) TableSample#

Leverages the sem (Standard Error of the Mean) aggregation technique to perform analysis and aggregation on the vDataFrame. Standard Error of the Mean is a valuable statistical measure used to estimate the precision of the sample mean as an approximation of the population mean.

When we aggregate the vDataFrame using sem, we gain insights into the variability or uncertainty associated with the sample mean. This measure helps us assess the reliability of the sample mean as an estimate of the true population mean.

It is worth noting that computing the Standard Error of the Mean requires statistical calculations and can be particularly useful when evaluating the precision of sample statistics or making inferences about a larger dataset based on a sample.

Warning

To compute sem, VerticaPy needs to execute multiple queries. It necessitates, at a minimum, a query that includes a subquery to perform this type of aggregation. This complexity is the reason why calculating sem is typically slower than some other types of aggregations.

Parameters#

columns: SQLColumns, optional

List of the vDataColumns names. If empty, all numerical vDataColumns are used.

**agg_kwargs

Any optional parameter to pass to the Aggregate function.

Returns#

TableSample

result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
    }
)

Now, let’s calculate the standard error of the mean for specific columns.

data.sem(
    columns = ["x", "y", "z"],
)
sem
"x"2.83433980723151
"y"0.18298126367785
"z"1.57831284242745

Note

All the calculations are pushed to the database.

Hint

For more precise control, please refer to the aggregate method.

See also

vDataColumn.sem() : Standard Error of Mean for a specific column.
vDataFrame.mad() : Mean Absolute Deviation for particular columns.