vDataFrame.describe¶
In [ ]:
vDataFrame.describe(method: str = "auto",
columns: list = [],
unique: bool = True,
ncols_block: int = 20,
processes: int = 1,)
Aggregates the vDataFrame using multiple statistical aggregations (min, max, median, unique, etc.) depending on the types of the vColumns.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
method | str | ✓ | The describe method.
|
columns | list | ✓ | List of the vcolumns names. If empty, the vcolumns will be selected depending on the parameter 'method'. |
unique | bool | ✓ | If set to True, the cardinality of each element will be computed. |
ncols_block | int | ✓ | The number of columns used per query. Setting this parameter divides what would otherwise be one large query into many smaller queries called "blocks," the size of which is determined by the ncols_block parameter. |
processes | int | ✓ | Number of child processes to create. Setting this with the ncols_block parameter lets you parallelize a single query into many smaller queries, where each child process creates its own connection to the database and sends one query. This can improve query performance, but consumes more resources. If processes is set to 1, the queries are sent iteratively from a single process. |
Returns¶
tablesample : An object containing the result. For more information, see utilities.tablesample.
Example¶
In [1]:
from verticapy.datasets import load_titanic
titanic = load_titanic()
display(titanic)
In [2]:
# All statistics
titanic.describe(method = "all")
Out[2]:
In [3]:
# Categorical
titanic.describe(method = "categorical")
Out[3]:
In [4]:
# Length
titanic.describe(method = "length")
Out[4]:
In [5]:
# Numerical
titanic.describe(method = "numerical")
Out[5]:
In [6]:
# Range
titanic.describe(method = "range")
Out[6]:
In [7]:
# Statistics
titanic.describe(method = "statistics")
Out[7]: