verticapy.vDataFrame.rolling#

Adds a new vDataColumn to the vDataFrame by using an advanced analytical window function on one or two specific vDataColumn.

Warning

Some window functions can make the vDataFrame structure heavier. It is recommended to always check the current structure with the current_relation method and to save it with the to_db method, using the parameters inplace = True and relation_type = table.

Warning

Make use of the order_by parameter to sort your data. Otherwise, you might encounter unexpected results, as Vertica does not work with indexes, and the data may be randomly shuffled.

Parameters#

func: str

Function to use.

aad:
average absolute deviation
beta:
Beta Coefficient between 2 vDataColumns
count:
number of non-missing elements
corr:
Pearson correlation between 2 vDataColumns
cov:
covariance between 2 vDataColumns
kurtosis:
kurtosis
jb:
Jarque-Bera index
max:
maximum
mean:
average
min:
minimum
prod:
product
range:
difference between the max and the min
sem:
standard error of the mean
skewness:
skewness
sum:
sum
std:
standard deviation
var:
variance

Other window functions could work if it is part of the DB version you are using.

window: list | tuple

Window Frame Range. If set to two integers, computes a Row Window, otherwise it computes a Time Window. For example, if set to (-5, 1), the moving windows will take 5 rows preceding and one following. If set to ('- 5 minutes', '0 minutes'), the moving window will take all elements of the last 5 minutes.

columns: SQLColumns

Input vDataColumn. Must be a list of one or two elements.

by: SQLColumns, optional

vDataColumns used in the partition.

order_by: dict | list, optional

List of the vDataColumns used to sort the data using ascending/descending order or a dictionary of all the sorting methods. For example, to sort by “column1” ASC and “column2” DESC, use: {"column1": "asc", "column2": "desc"}.

name: str, optional

Name of the new vDataColumn. If empty, a default name is generated.

Returns#

vDataFrame: self

Examples#

Let’s begin by importing VerticaPy.

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

For this example, let’s generate the following dataset:

vdf = vp.vDataFrame(
    {
        "date": [
            "2014-01-01",
            "2014-01-02",
            "2014-01-03",
            "2014-01-04",
            "2014-01-05",
            "2014-01-06",
            "2014-01-07",
        ],
        "expenses": [40, 10, 12, 54, 98, 132, 50],
        "sale": [100, 120, 120, 110, 100, 90, 80],
    }
)

	Abc date Varchar(10) 100%	...	123 expenses Integer 100%	123 sale Integer 100%
1	2014-01-01	...	40	100
2	2014-01-02	...	10	120
3	2014-01-03	...	12	120
4	2014-01-04	...	54	110
5	2014-01-05	...	98	100
6	2014-01-06	...	132	90
7	2014-01-07	...	50	80

Let us make sure the correct data type is assigned:

vdf["date"].astype("datetime")

We can now employ the rolling function, specifying a custom window size, to visualize the data.

vdf.rolling(
    func = "sum",
    window = (-1,1),
    columns = ["sale"],
)

	📅 date Datetime 100%	...	123 expenses Integer 100%	123 Integer 100%
1	2014-01-07 00:00:00	...	50
2	2014-01-06 00:00:00	...	132
3	2014-01-05 00:00:00	...	98
4	2014-01-01 00:00:00	...	40
5	2014-01-04 00:00:00	...	54
6	2014-01-03 00:00:00	...	12
7	2014-01-02 00:00:00	...	10

Note

Rolling windows are valuable in time-series data for creating features because they allow us to analyze a specified number of past data points at each step. This approach is useful for capturing trends over time, adapting to different time scales, and smoothing out noise in the data. By applying aggregation functions within these windows, such as calculating averages or sums, we can generate new features that provide insights into the historical patterns of the dataset. These features, based on past observations, contribute to building more informed and predictive models, enhancing our understanding of the underlying trends in the data.