verticapy.machine_learning.vertica.tsa.ARIMA.predict#

ARIMA.predict(vdf: str | vDataFrame | None = None, ts: str | None = None, y: str | None = None, start: int | None = None, npredictions: int = 10, output_standard_errors: bool = False, output_index: bool = False, output_estimated_ts: bool = False, freq: Literal[None, 'm', 'months', 'y', 'year', 'infer'] = 'infer', filter_step: int | None = None, method: Literal['auto', 'forecast'] = 'auto') → vDataFrame#

Predicts using the input relation.

Parameters#

vdf: SQLRelation

Object used to run the prediction. You can also specify a customized relation, but you must enclose it with an alias. For example, (SELECT 1) x is valid, whereas (SELECT 1) and SELECT 1 are invalid.

ts: str

TS (Time Series) :py:class`vDataColumn` used to order the data. The :py:class`vDataColumn` type must be date (date, datetime, timestamp…) or numerical.

y: str, optional

Response column.

start: int, optional

The behavior of the start parameter and its range of accepted values depends on whether you provide a timeseries-column (ts):

No provided timeseries-column:
start must be an integer greater or equal to 0, where zero indicates to start prediction at the end of the in-sample data. If start is a positive value, the function predicts the values between the end of the in-sample data and the start index, and then uses the predicted values as time series inputs for the subsequent npredictions.
timeseries-column provided:
start must be an integer greater or equal to 1 and identifies the index (row) of the timeseries-column at which to begin prediction. If the start index is greater than the number of rows, N, in the input data, the function predicts the values between N and start and uses the predicted values as time series inputs for the subsequent npredictions.

Default:

No provided timeseries-column:
prediction begins from the end of the in-sample data.
timeseries-column provided:
prediction begins from the end of the provided input data.

npredictions: int, optional

integer greater or equal to 1, the number of predicted timesteps.

output_standard_errors: bool, optional

boolean, whether to return estimates of the standard error of each prediction.

output_index: bool, optional

boolean, whether to return the index of each position.

output_estimated_ts: bool, optional

Boolean, whether to return the estimated abscissa of each prediction. The real one is hard to obtain due to interval computations.

freq: str, optional

How to compute the delta.

m/month:
We assume that the data is organized on a monthly basis.
y/year:
We assume that the data is organized on a yearly basis.
infer:
When making inferences, the system will attempt to identify the best option, which may involve more computational resources.
None:
The inference is based on the average of the difference between ts and its lag.

filter_step: int, optional

Integer parameter that determines the frequency of predictions. You can adjust it according to your specific requirements, such as setting it to 3 for predictions every third step.

Note

It is only utilized when output_estimated_ts=True.

method: str, optional

Forecasting method. One of the following:

auto:
the model initially utilizes the true values at each step for forecasting. However, when it reaches a point where it can no longer rely on true values, it transitions to using its own predictions for further forecasting. This method is often referred to as “one step ahead” forecasting.
forecast:
the model initiates forecasting from an initial value and entirely disregards any subsequent true values. This approach involves forecasting based solely on the model’s own predictions and does not consider actual observations after the start point.

Returns#

vDataFrame: a new object.

Examples#

We import verticapy:

import verticapy as vp

For this example, we will use the airline passengers dataset.

import verticapy.datasets as vpd

data = vpd.load_airline_passengers()

	📅 date Date	123 passengers Integer
1	1949-01-01	112
2	1949-02-01	118
3	1949-03-01	132
4	1949-04-01	129
5	1949-05-01	121
6	1949-06-01	135
7	1949-07-01	148
8	1949-08-01	148
9	1949-09-01	136
10	1949-10-01	119
11	1949-11-01	104
12	1949-12-01	118
13	1950-01-01	115
14	1950-02-01	126
15	1950-03-01	141
16	1950-04-01	135
17	1950-05-01	125
18	1950-06-01	149
19	1950-07-01	170
20	1950-08-01	170
21	1950-09-01	158
22	1950-10-01	133
23	1950-11-01	114
24	1950-12-01	140
25	1951-01-01	145
26	1951-02-01	150
27	1951-03-01	178
28	1951-04-01	163
29	1951-05-01	172
30	1951-06-01	178
31	1951-07-01	199
32	1951-08-01	199
33	1951-09-01	184
34	1951-10-01	162
35	1951-11-01	146
36	1951-12-01	166
37	1952-01-01	171
38	1952-02-01	180
39	1952-03-01	193
40	1952-04-01	181
41	1952-05-01	183
42	1952-06-01	218
43	1952-07-01	230
44	1952-08-01	242
45	1952-09-01	209
46	1952-10-01	191
47	1952-11-01	172
48	1952-12-01	194
49	1953-01-01	196
50	1953-02-01	196
51	1953-03-01	236
52	1953-04-01	235
53	1953-05-01	229
54	1953-06-01	243
55	1953-07-01	264
56	1953-08-01	272
57	1953-09-01	237
58	1953-10-01	211
59	1953-11-01	180
60	1953-12-01	201
61	1954-01-01	204
62	1954-02-01	188
63	1954-03-01	235
64	1954-04-01	227
65	1954-05-01	234
66	1954-06-01	264
67	1954-07-01	302
68	1954-08-01	293
69	1954-09-01	259
70	1954-10-01	229
71	1954-11-01	203
72	1954-12-01	229
73	1955-01-01	242
74	1955-02-01	233
75	1955-03-01	267
76	1955-04-01	269
77	1955-05-01	270
78	1955-06-01	315
79	1955-07-01	364
80	1955-08-01	347
81	1955-09-01	312
82	1955-10-01	274
83	1955-11-01	237
84	1955-12-01	278
85	1956-01-01	284
86	1956-02-01	277
87	1956-03-01	317
88	1956-04-01	313
89	1956-05-01	318
90	1956-06-01	374
91	1956-07-01	413
92	1956-08-01	405
93	1956-09-01	355
94	1956-10-01	306
95	1956-11-01	271
96	1956-12-01	306
97	1957-01-01	315
98	1957-02-01	301
99	1957-03-01	356
100	1957-04-01	348

Rows: 1-100 | Columns: 2

First we import the model:

from verticapy.machine_learning.vertica.tsa import ARIMA

Then we can create the model:

model = ARIMA(order = (12, 1, 2))

We can now fit the model:

model.fit(data, "date", "passengers")

Prediction is straight-forward:

model.predict()

	123 prediction Float(22)
1	436.808245506626
2	411.303769750774
3	456.591517112856
4	497.165582992911
5	523.414142302269
6	579.634194756896
7	670.753858449996
8	648.086244158784
9	558.685139438718
10	498.606577143251

Rows: 1-10 | Column: prediction | Type: Float(22)

Important

For this example, a specific model is utilized, and it may not correspond exactly to the model you are working with. To see a comprehensive example specific to your class of interest, please refer to that particular class.

Examples: ARIMA; ARMA; AR; MA;