verticapy.vDataColumn.describe#
- vDataColumn.describe(method: Literal['auto', 'numerical', 'categorical', 'cat_stats'] = 'auto', max_cardinality: int = 6, numcol: str | None = None) TableSample #
This function aggregates the vDataColumn using multiple statistical aggregations such as minimum (min), maximum (max), median, cardinality (unique), and other relevant statistics. The specific aggregations applied depend on the data types of the vDataColumn. For example, numeric columns are aggregated with numerical aggregations (min, median, max…), while categorical columns are aggregated using categorical ones (cardinality, mode…). This versatile function provides valuable insights into the dataset’s statistical properties and can be customized to meet specific analytical requirements.
Parameters#
- method: str, optional
The describe method.
- auto:
Sets the method to
numerical
if the vDataColumn is numerical,categorical
otherwise.
- categorical:
Uses only categorical aggregations during the computation.
- cat_stats:
Computes statistics of a numerical column for each vDataColumn category. In this case, the parameter
numcol
must be defined.
- numerical:
Uses popular numerical aggregations during the computation.
- max_cardinality: int, optional
Cardinality threshold to use to determine if the vDataColumn is considered as categorical.
- numcol: str, optional
Numerical vDataColumn to use when the parameter method is set to
cat_stats
.
Returns#
- TableSample
result.
Examples#
For this example, we will use the following dataset:
import verticapy as vp data = vp.vDataFrame( { "x": [1, 2, 4, 9, 10, 15, 20, 22], "y": [1, 2, 1, 2, 1, 1, 2, 1], "z": [10, 12, 2, 1, 9, 8, 1, 3], "c": ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'D'], } )
The
describe
method provides you with a variety of statistical methods.The
numerical
parameter allows for the computation of numerical aggregations.data["x"].describe(method = "numerical")
value name "x" dtype integer unique 8.0 count 8 mean 10.375 std 8.01672359152148 min 1.0 approx_25% 3.5 approx_50% 9.5 approx_75% 16.25 max 22.0 The
categorical
parameter allows for the computation of categorical aggregations.data["x"].describe(method = "categorical")
value name "x" dtype integer unique 8.0 count 8.0 9 1 10 1 1 1 4 1 2 1 15 1 Others 1 The
cat_stats
parameter enables grouping by a categorical column and computing various aggregations on a numerical one.data["c"].describe( method = "cat_stats", numcol = "x" )
... approx_90% max A ... 7.5 9 B ... 14.5 15 C ... 20.0 20 D ... 22.0 22 Note
All the calculations are pushed to the database.
See also
vDataColumn.
aggregate()
: Aggregations for a specific column.vDataFrame.
aggregate()
: Aggregations for specific columns.vDataFrame.
describe()
: Summarizes information within the columns.