verticapy.machine_learning.vertica.tsa.ARIMA.predict#
- ARIMA.predict(vdf: str | vDataFrame | None = None, ts: str | None = None, y: str | None = None, start: int | None = None, npredictions: int = 10, output_standard_errors: bool = False, output_index: bool = False, output_estimated_ts: bool = False, freq: Literal[None, 'm', 'months', 'y', 'year', 'infer'] = 'infer', filter_step: int | None = None, method: Literal['auto', 'forecast'] = 'auto') vDataFrame #
Predicts using the input relation.
Parameters#
- vdf: SQLRelation
Object used to run the prediction. You can also specify a customized relation, but you must enclose it with an alias. For example,
(SELECT 1) x
is valid, whereas(SELECT 1)
andSELECT 1
are invalid.- ts: str
TS (Time Series) :py:class`vDataColumn` used to order the data. The :py:class`vDataColumn` type must be
date
(date
,datetime
,timestamp
…) or numerical.- y: str, optional
Response column.
- start: int, optional
The behavior of the start parameter and its range of accepted values depends on whether you provide a timeseries-column (
ts
):- No provided timeseries-column:
start
must be an integer greater or equal to 0, where zero indicates to start prediction at the end of the in-sample data. Ifstart
is a positive value, the function predicts the values between the end of the in-sample data and the start index, and then uses the predicted values as time series inputs for the subsequentnpredictions
.
- timeseries-column provided:
start
must be aninteger
greater or equal to1
and identifies the index (row) of the timeseries-column at which to begin prediction. If thestart
index is greater than the number of rows,N
, in the input data, the function predicts the values betweenN
andstart
and uses the predicted values as time series inputs for the subsequent npredictions.
Default:
- No provided timeseries-column:
prediction begins from the end of the in-sample data.
- timeseries-column provided:
prediction begins from the end of the provided input data.
- npredictions: int, optional
integer
greater or equal to1
, the number of predicted timesteps.- output_standard_errors: bool, optional
boolean
, whether to return estimates of the standard error of each prediction.- output_index: bool, optional
boolean
, whether to return the index of each position.- output_estimated_ts: bool, optional
Boolean, whether to return the estimated abscissa of each prediction. The real one is hard to obtain due to interval computations.
- freq: str, optional
How to compute the delta.
- m/month:
We assume that the data is organized on a monthly basis.
- y/year:
We assume that the data is organized on a yearly basis.
- infer:
When making inferences, the system will attempt to identify the best option, which may involve more computational resources.
- None:
The inference is based on the average of the difference between
ts
and its lag.
- filter_step: int, optional
Integer parameter that determines the frequency of predictions. You can adjust it according to your specific requirements, such as setting it to
3
for predictions every third step.Note
It is only utilized when
output_estimated_ts=True
.- method: str, optional
Forecasting method. One of the following:
- auto:
the model initially utilizes the true values at each step for forecasting. However, when it reaches a point where it can no longer rely on true values, it transitions to using its own predictions for further forecasting. This method is often referred to as “one step ahead” forecasting.
- forecast:
the model initiates forecasting from an initial value and entirely disregards any subsequent true values. This approach involves forecasting based solely on the model’s own predictions and does not consider actual observations after the start point.
Returns#
- vDataFrame
a new object.
Examples#
We import
verticapy
:import verticapy as vp
For this example, we will use the airline passengers dataset.
import verticapy.datasets as vpd data = vpd.load_airline_passengers()
📅dateDate123passengersInteger1 1949-01-01 112 2 1949-02-01 118 3 1949-03-01 132 4 1949-04-01 129 5 1949-05-01 121 6 1949-06-01 135 7 1949-07-01 148 8 1949-08-01 148 9 1949-09-01 136 10 1949-10-01 119 11 1949-11-01 104 12 1949-12-01 118 13 1950-01-01 115 14 1950-02-01 126 15 1950-03-01 141 16 1950-04-01 135 17 1950-05-01 125 18 1950-06-01 149 19 1950-07-01 170 20 1950-08-01 170 21 1950-09-01 158 22 1950-10-01 133 23 1950-11-01 114 24 1950-12-01 140 25 1951-01-01 145 26 1951-02-01 150 27 1951-03-01 178 28 1951-04-01 163 29 1951-05-01 172 30 1951-06-01 178 31 1951-07-01 199 32 1951-08-01 199 33 1951-09-01 184 34 1951-10-01 162 35 1951-11-01 146 36 1951-12-01 166 37 1952-01-01 171 38 1952-02-01 180 39 1952-03-01 193 40 1952-04-01 181 41 1952-05-01 183 42 1952-06-01 218 43 1952-07-01 230 44 1952-08-01 242 45 1952-09-01 209 46 1952-10-01 191 47 1952-11-01 172 48 1952-12-01 194 49 1953-01-01 196 50 1953-02-01 196 51 1953-03-01 236 52 1953-04-01 235 53 1953-05-01 229 54 1953-06-01 243 55 1953-07-01 264 56 1953-08-01 272 57 1953-09-01 237 58 1953-10-01 211 59 1953-11-01 180 60 1953-12-01 201 61 1954-01-01 204 62 1954-02-01 188 63 1954-03-01 235 64 1954-04-01 227 65 1954-05-01 234 66 1954-06-01 264 67 1954-07-01 302 68 1954-08-01 293 69 1954-09-01 259 70 1954-10-01 229 71 1954-11-01 203 72 1954-12-01 229 73 1955-01-01 242 74 1955-02-01 233 75 1955-03-01 267 76 1955-04-01 269 77 1955-05-01 270 78 1955-06-01 315 79 1955-07-01 364 80 1955-08-01 347 81 1955-09-01 312 82 1955-10-01 274 83 1955-11-01 237 84 1955-12-01 278 85 1956-01-01 284 86 1956-02-01 277 87 1956-03-01 317 88 1956-04-01 313 89 1956-05-01 318 90 1956-06-01 374 91 1956-07-01 413 92 1956-08-01 405 93 1956-09-01 355 94 1956-10-01 306 95 1956-11-01 271 96 1956-12-01 306 97 1957-01-01 315 98 1957-02-01 301 99 1957-03-01 356 100 1957-04-01 348 Rows: 1-100 | Columns: 2First we import the model:
from verticapy.machine_learning.vertica.tsa import ARIMA
Then we can create the model:
model = ARIMA(order = (12, 1, 2))
We can now fit the model:
model.fit(data, "date", "passengers")
Prediction is straight-forward:
model.predict()
123predictionFloat(22)1 436.808245506626 2 411.303769750774 3 456.591517112856 4 497.165582992911 5 523.414142302269 6 579.634194756896 7 670.753858449996 8 648.086244158784 9 558.685139438718 10 498.606577143251 Rows: 1-10 | Column: prediction | Type: Float(22)