vDataFrame.agg / aggregate¶
In [ ]:
vDataFrame.aggregate(func: list,
columns: list = [],
ncols_block: int = 20,
processes: int = 1,)
Aggregates the vDataFrame using the input functions.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
func | list | ❌ | List of the different aggregation.
|
columns | list | ✓ | List of the vcolumns names. If empty, all the vcolumns or only numerical vcolumns will be used depending on the aggregations. |
ncols_block | int | ✓ | Number of columns used per query. Setting this parameter divides what would otherwise be one large query into many smaller queries called "blocks." The size of each block is determined by the ncols_block parameter. |
processes | int | ✓ | Number of child processes to create. Setting this with the ncols_block parameter lets you parallelize a single query into many smaller queries, where each child process creates its own connection to the database and sends one query. This can improve query performance, but consumes more resources. If processes is set to 1, the queries are sent iteratively from a single process. |
Returns¶
tablesample : An object containing the result. For more information, see utilities.tablesample.
Example¶
In [1]:
from verticapy.datasets import load_titanic
titanic = load_titanic()
display(titanic)
In [2]:
# Turning on the SQL to see the queries
from verticapy import set_option
set_option("cache", False)
set_option("sql_on", True)
In [8]:
# Sending one query
titanic.aggregate(func = ["min", "approx_10%", "approx_50%", "approx_90%", "max"],
columns = ["age", "fare", "pclass", "survived"])
Out[8]:
In [3]:
# VerticaPy can deal if incompatible aggregations
# It will send multiple queries with compatible aggregations
titanic.aggregate(func = ["unique", "min", "10%", "50%", "approx_90%", "max"],
columns = ["age", "fare", "pclass", "survived"])
Out[3]:
In [11]:
# Sending multiple queries iteratively (2 elements at the time)
titanic.aggregate(func = ["min", "approx_10%", "approx_50%", "approx_90%", "max"],
columns = ["age", "fare", "pclass", "survived"],
ncols_block = 2)
Out[11]:
In [4]:
# Sending multiple queries in parallel (2 elements at the time)
titanic.aggregate(func = ["min", "approx_10%", "approx_50%", "approx_90%", "max"],
columns = ["age", "fare", "pclass", "survived"],
ncols_block = 2,
process = 2)
Out[4]:
See Also¶
vDataFrame.analytic | Adds a new vcolumn to the vDataFrame by using an advanced analytical function on a specific vcolumn. |
vDataFrame.score | Computes the score using the input columns and the input method. |