Loading...

verticapy.vDataColumn.describe#

vDataColumn.describe(method: Literal['auto', 'numerical', 'categorical', 'cat_stats'] = 'auto', max_cardinality: int = 6, numcol: str | None = None) TableSample#

This function aggregates the vDataColumn using multiple statistical aggregations such as minimum (min), maximum (max), median, cardinality (unique), and other relevant statistics. The specific aggregations applied depend on the data types of the vDataColumn. For example, numeric columns are aggregated with numerical aggregations (min, median, max…), while categorical columns are aggregated using categorical ones (cardinality, mode…). This versatile function provides valuable insights into the dataset’s statistical properties and can be customized to meet specific analytical requirements.

Parameters#

method: str, optional

The describe method.

  • auto:

    Sets the method to numerical if the vDataColumn is numerical, categorical otherwise.

  • categorical:

    Uses only categorical aggregations during the computation.

  • cat_stats:

    Computes statistics of a numerical column for each vDataColumn category. In this case, the parameter numcol must be defined.

  • numerical:

    Uses popular numerical aggregations during the computation.

max_cardinality: int, optional

Cardinality threshold to use to determine if the vDataColumn is considered as categorical.

numcol: str, optional

Numerical vDataColumn to use when the parameter method is set to cat_stats.

Returns#

TableSample

result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
        "c": ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'D'],
    }
)

The describe method provides you with a variety of statistical methods.

The numerical parameter allows for the computation of numerical aggregations.

data["x"].describe(method = "numerical")
value
name"x"
dtypeinteger
unique8.0
count8
mean10.375
std8.01672359152148
min1.0
approx_25%3.5
approx_50%9.5
approx_75%16.25
max22.0

The categorical parameter allows for the computation of categorical aggregations.

data["x"].describe(method = "categorical")
value
name"x"
dtypeinteger
unique8.0
count8.0
91
101
11
41
21
151
Others1

The cat_stats parameter enables grouping by a categorical column and computing various aggregations on a numerical one.

data["c"].describe(
    method = "cat_stats",
    numcol = "x"
)
...
approx_90%
max
A...7.59
B...14.515
C...20.020
D...22.022

Note

All the calculations are pushed to the database.

See also

vDataColumn.aggregate() : Aggregations for a specific column.
vDataFrame.aggregate() : Aggregations for specific columns.
vDataFrame.describe() : Summarizes information within the columns.