vDataFrame.analytic¶
In [ ]:
vDataFrame.analytic(func: str,
columns: (str, list) = [],
by: list = [],
order_by: (dict, list) = [],
name: str = "",
offset: int = 1,
x_smoothing: float = 0.5,
add_count: bool = True,)
Adds a new vcolumn to the vDataFrame by using an advanced analytical function on one or two specific vcolumns.
⚠ Warning: Some analytical functions can significantly increase vDataFrame memory usage. You should always check the vDataFrame with the 'current_relation' method and save it with the 'to_db' method with the parameters 'inplace = True' and 'relation_type = table'
Parameters¶
| Name | Type | Optional | Description |
|---|---|---|---|
func | str | ❌ | Function to use.
|
columns | str | ✓ | Input vcolumns. It can be a list of one or two elements. |
by | list | ✓ | vcolumns used in the partition. |
order_by | dict / list | ✓ | List of the vcolumns to use to sort the data using asc order or dictionary of all the sorting methods. For example, to sort by "column1" ASC and "column2" DESC, write {"column1": "asc", "column2": "desc"} |
name | str | ✓ | Name of the new vcolumn. If empty, a default name based on the other parameters will be generated. |
offset | int | ✓ | Lead/Lag offset if parameter 'func' is the function 'lead'/'lag'. |
x_smoothing | float | ✓ | The smoothing parameter of the 'ema' if the function is 'ema'. It must be in [0;1] |
add_count | bool | ✓ | If the function is the 'mode' and this parameter is True then another column will be added to the vDataFrame with the mode number of occurences. |
Returns¶
vDataFrame : self
Example¶
In [1]:
from verticapy import vDataFrame
flights = vDataFrame("public.usa_flights")
flights.eval(name = "week", expr = "WEEK(scheduled_departure)")
display(flights)
In [2]:
# LAG of departure_delay for the same flight
# (same airline and same origin / destination airports)
flights.analytic(func = "lag",
columns = "departure_delay",
by = ["origin_airport", "destination_airport", "airline"],
order_by = {"scheduled_departure": "asc"})
Out[2]:
In [3]:
# Airlines having the biggest number of flights to manage in the week
flights.analytic(func = "mode",
columns = "airline",
by = ["origin_airport", "week"],
add_count = True)
Out[3]:
In [4]:
# Correlation between the arrival delay and departure delay for the
# same origin and destination airports
flights.analytic(func = "corr",
columns = ["departure_delay", "arrival_delay"],
by = ["origin_airport", "destination_airport"])
Out[4]:
See Also¶
| vDataFrame.eval | Evaluates a customized expression. |
| vDataFrame.rolling | Computes a customized moving window. |
