verticapy.vDataFrame.cumsum#
- vDataFrame.cumsum(column: str, by: str | list[str] | None = None, order_by: None | dict | list = None, name: str | None = None) vDataFrame #
Adds a new
vDataColumn
to thevDataFrame
by computing the cumulative sum of the inputvDataColumn
.Warning
Make use of the
order_by
parameter to sort your data. Otherwise, you might encounter unexpected results, as Vertica does not work with indexes, and the data may be randomly shuffled.Parameters#
- column: str
Input
vDataColumn
.- by: list, optional
vDataColumns used in the partition.
- order_by: dict | list, optional
List of the
vDataColumn
used to sort the data using ascending/descending order or a dictionary of all the sorting methods. For example, to sort by “column1” ASC and “column2” DESC, use:{"column1": "asc", "column2": "desc"}
.- name: str, optional
Name of the new
vDataColumn
. If empty, a default name is generated.
Returns#
- vDataFrame
self
Examples#
Let’s begin by importing VerticaPy.
import verticapy as vp
Hint
By assigning an alias to
verticapy
, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions fromverticapy
are used as intended without interfering with functions from other libraries.For this example, let’s generate the following dataset:
vdf = vp.vDataFrame( { "id": [0, 1, 2, 3, 4, 5, 6], "sale": [100, 120, 120, 50, 100, 90, 80], } )
123idInteger100%123saleInteger100%1 0 100 2 1 120 3 2 120 4 3 50 5 4 100 6 5 90 7 6 80 Now the cummulative maximum of the selected column can be easily calculated:
vdf.cumsum( "sale", name = "cumsum_sales", order_by = "id", )
123idInteger100%... 123saleInteger100%123cumsum_salesInteger100%1 0 ... 100 100 2 1 ... 120 220 3 2 ... 120 340 4 3 ... 50 390 5 4 ... 100 490 6 5 ... 90 580 7 6 ... 80 660 Note
Rolling windows are valuable in time-series data for creating features because they allow us to analyze a specified number of past data points at each step. This approach is useful for capturing trends over time, adapting to different time scales, and smoothing out noise in the data. By applying aggregation functions within these windows, such as calculating averages or sums, we can generate new features that provide insights into the historical patterns of the dataset. These features, based on past observations, contribute to building more informed and predictive models, enhancing our understanding of the underlying trends in the data.
See also
vDataFrame.
rolling()
: Advanced analytical window function.