verticapy.vDataFrame.rolling#
- vDataFrame.rolling(func: str, window: list | tuple, columns: str | list[str], by: str | list[str] | None = None, order_by: None | dict | list = None, name: str | None = None) vDataFrame #
Adds a new
vDataColumn
to thevDataFrame
by using an advanced analytical window function on one or two specificvDataColumn
.Warning
Some window functions can make the vDataFrame structure heavier. It is recommended to always check the current structure with the
current_relation
method and to save it with theto_db
method, using the parametersinplace = True
andrelation_type = table
.Warning
Make use of the
order_by
parameter to sort your data. Otherwise, you might encounter unexpected results, as Vertica does not work with indexes, and the data may be randomly shuffled.Parameters#
- func: str
Function to use.
- aad:
average absolute deviation
- beta:
Beta Coefficient between 2 vDataColumns
- count:
number of non-missing elements
- corr:
Pearson correlation between 2 vDataColumns
- cov:
covariance between 2 vDataColumns
- kurtosis:
kurtosis
- jb:
Jarque-Bera index
- max:
maximum
- mean:
average
- min:
minimum
- prod:
product
- range:
difference between the max and the min
- sem:
standard error of the mean
- skewness:
skewness
- sum:
sum
- std:
standard deviation
- var:
variance
Other window functions could work if it is part of the DB version you are using.
- window: list | tuple
Window Frame Range. If set to two integers, computes a Row Window, otherwise it computes a Time Window. For example, if set to
(-5, 1)
, the moving windows will take 5 rows preceding and one following. If set to('- 5 minutes', '0 minutes')
, the moving window will take all elements of the last 5 minutes.- columns: SQLColumns
Input
vDataColumn
. Must be a list of one or two elements.- by: SQLColumns, optional
vDataColumns used in the partition.
- order_by: dict | list, optional
List of the vDataColumns used to sort the data using ascending/descending order or a dictionary of all the sorting methods. For example, to sort by “column1” ASC and “column2” DESC, use:
{"column1": "asc", "column2": "desc"}
.- name: str, optional
Name of the new
vDataColumn
. If empty, a default name is generated.
Returns#
- vDataFrame
self
Examples#
Let’s begin by importing VerticaPy.
import verticapy as vp
Hint
By assigning an alias to
verticapy
, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions fromverticapy
are used as intended without interfering with functions from other libraries.For this example, let’s generate the following dataset:
vdf = vp.vDataFrame( { "date": [ "2014-01-01", "2014-01-02", "2014-01-03", "2014-01-04", "2014-01-05", "2014-01-06", "2014-01-07", ], "expenses": [40, 10, 12, 54, 98, 132, 50], "sale": [100, 120, 120, 110, 100, 90, 80], } )
AbcdateVarchar(10)100%... 123expensesInteger100%123saleInteger100%1 2014-01-01 ... 40 100 2 2014-01-02 ... 10 120 3 2014-01-03 ... 12 120 4 2014-01-04 ... 54 110 5 2014-01-05 ... 98 100 6 2014-01-06 ... 132 90 7 2014-01-07 ... 50 80 Let us make sure the correct data type is assigned:
vdf["date"].astype("datetime")
We can now employ the
rolling
function, specifying a custom window size, to visualize the data.vdf.rolling( func = "sum", window = (-1,1), columns = ["sale"], )
📅dateDatetime100%... 123expensesInteger100%123Integer100%1 2014-01-07 00:00:00 ... 50 2 2014-01-06 00:00:00 ... 132 3 2014-01-05 00:00:00 ... 98 4 2014-01-01 00:00:00 ... 40 5 2014-01-04 00:00:00 ... 54 6 2014-01-03 00:00:00 ... 12 7 2014-01-02 00:00:00 ... 10 Note
Rolling windows are valuable in time-series data for creating features because they allow us to analyze a specified number of past data points at each step. This approach is useful for capturing trends over time, adapting to different time scales, and smoothing out noise in the data. By applying aggregation functions within these windows, such as calculating averages or sums, we can generate new features that provide insights into the historical patterns of the dataset. These features, based on past observations, contribute to building more informed and predictive models, enhancing our understanding of the underlying trends in the data.
See also
vDataFrame.
analytic()
: Advanced Analytical functions.