verticapy.vDataColumn.describe#

vDataColumn.describe(method: Literal['auto', 'numerical', 'categorical', 'cat_stats'] = 'auto', max_cardinality: int = 6, numcol: str | None = None) → TableSample#

This function aggregates the vDataColumn using multiple statistical aggregations such as minimum (min), maximum (max), median, cardinality (unique), and other relevant statistics. The specific aggregations applied depend on the data types of the vDataColumn. For example, numeric columns are aggregated with numerical aggregations (min, median, max…), while categorical columns are aggregated using categorical ones (cardinality, mode…). This versatile function provides valuable insights into the dataset’s statistical properties and can be customized to meet specific analytical requirements.

Parameters#

method: str, optional

The describe method.

auto:
Sets the method to numerical if the vDataColumn is numerical, categorical otherwise.

categorical:
Uses only categorical aggregations during the computation.

cat_stats:
Computes statistics of a numerical column for each vDataColumn category. In this case, the parameter numcol must be defined.

numerical:
Uses popular numerical aggregations during the computation.

max_cardinality: int, optional

Cardinality threshold to use to determine if the vDataColumn is considered as categorical.

numcol: str, optional

Numerical vDataColumn to use when the parameter method is set to cat_stats.

Returns#

TableSample: result.

Examples#

For this example, we will use the following dataset:

import verticapy as vp

data = vp.vDataFrame(
    {
        "x": [1, 2, 4, 9, 10, 15, 20, 22],
        "y": [1, 2, 1, 2, 1, 1, 2, 1],
        "z": [10, 12, 2, 1, 9, 8, 1, 3],
        "c": ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'D'],
    }
)

The describe method provides you with a variety of statistical methods.

The numerical parameter allows for the computation of numerical aggregations.

data["x"].describe(method = "numerical")

	value
name	"x"
dtype	integer
unique	8.0
count	8
mean	10.375
std	8.01672359152148
min	1.0
approx_25%	3.5
approx_50%	9.5
approx_75%	16.25
max	22.0

The categorical parameter allows for the computation of categorical aggregations.

data["x"].describe(method = "categorical")

	value
name	"x"
dtype	integer
unique	8.0
count	8.0
9	1
10	1
1	1
4	1
2	1
15	1
Others	1

The cat_stats parameter enables grouping by a categorical column and computing various aggregations on a numerical one.

data["c"].describe(
    method = "cat_stats",
    numcol = "x"
)

	...	approx_90%	max
A	...	7.5	9
B	...	14.5	15
C	...	20.0	20
D	...	22.0	22

Note

All the calculations are pushed to the database.