
verticapy.machine_learning.vertica.tsa.VAR¶
- class verticapy.machine_learning.vertica.tsa.VAR(name: str = None, overwrite_model: bool = False, p: int = 3, method: Literal['yule-walker'] = 'yule-walker', penalty: Literal[None, 'none', 'l2'] = 'none', C: Annotated[int | float | Decimal, 'Python Numbers'] = 1.0, missing: Literal['drop', 'error', 'zero'] = 'error', subtract_mean: bool = False)¶
Creates a inDB VectorAutoregressor model.
Added in version 24.2.0.
Parameters¶
- name: str, optional
Name of the model. The model is stored in the database.
- overwrite_model: bool, optional
If set to
True
, training a model with the same name as an existing model overwrites the existing model.- p: int, optional
Integer in the range [1, 1999], the number of lags to consider in the computation. Larger values for p weaken the correlation.
- method: str, optional
One of the following algorithms for training the model:
- ols:
Ordinary Least Squares
- yule-walker:
Yule-Walker
- penalty: str, optional
Method of regularization.
- none:
No regularization.
- l2:
L2 regularization.
- C: PythonNumber, optional
The regularization parameter value. The value must be zero or non-negative.
- missing: str, optional
Method for handling missing values, one of the following strings:
- ‘drop’:
Missing values are ignored.
- ‘error’:
Missing values raise an error.
- ‘zero’:
Missing values are set to zero.
- ‘linear_interpolation’:
Missing values are replaced by a linearly interpolated value based on the nearest valid entries before and after the missing value. In cases where the first or last values in a dataset are missing, the function errors.
- subtract_mean: bool, optional
For Yule Walker, if
subtract_mean is True
, then the mean of the column(s) will be subtracted before calculating the coefficients. IfFalse
(default), then the calculations will be performed directly on the data, this often gives a more accurate model. Note that the means saved in the model will be saved as all 0s if this parameter is set toFalse
. This parameter has no effect for OLS.
Attributes¶
Many attributes are created during the fitting phase.
- phi_: numpy.array
The coefficient of the AutoRegressive process. It represents the strength and direction of the relationship between a variable and its past values.
Note
In the case of multivariate analysis, each coefficient is represented by a matrix of numbers.
- intercept_: float
Represents the expected value of the time series when the lagged values are zero. It signifies the baseline or constant term in the model, capturing the average level of the series in the absence of any historical influence.
Note
In the case of multivariate analysis, the intercept is represented by a vector of numbers.
- features_importance_: numpy.array
The importance of features is computed through the AutoRegressive part coefficients, which are normalized based on their range. Subsequently, an activation function calculates the final score. It is necessary to use the
features_importance()
method to compute it initially, and the computed values will be subsequently utilized for subsequent calls.- mse_: float
The mean squared error (MSE) of the model, based on one-step forward forecasting, may not always be relevant. Utilizing a full forecasting approach is recommended to compute a more meaningful and comprehensive metric.
- n_: int
The number of rows used to fit the model.
Note
All attributes can be accessed using the
get_attributes()
method.Note
Several other attributes can be accessed by using the
get_vertica_attributes()
method.Examples¶
The following examples provide a basic understanding of usage. For more detailed examples, please refer to the Machine Learning or the Examples section on the website.
Initialization¶
We import
verticapy
:import verticapy as vp
Hint
By assigning an alias to
verticapy
, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions fromverticapy
are used as intended without interfering with functions from other libraries.For this example, we will generate a dummy time-series dataset.
data = vp.vDataFrame( { "month": [i for i in range(1, 11)], "GB1": [5, 10, 20, 35, 55, 80, 110, 145, 185, 230], "GB2": [3, 7, 12, 18, 22, 30, 37, 39, 51, 80], } )
123month123GB1123GB21 1 5 3 2 2 10 7 3 3 20 12 4 4 35 18 5 5 55 22 6 6 80 30 7 7 110 37 8 8 145 39 9 9 185 51 10 10 230 80 Rows: 1-10 | Columns: 3Note
VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.
We can plot the data to visually inspect it for the presence of any trends:
data.plot(ts = "month", columns = ["GB1", "GB2"])
Though the increasing trend is obvious in our example, we can confirm it by the
mkt()
(Mann Kendall test) test:from verticapy.machine_learning.model_selection.statistical_tests import mkt mkt(data, column = "GB1", ts = "month")
value Mann Kendall Test Statistic 3.935479640399647 S 45.0 STDS 11.1803398874989 p_value 8.303070332644367e-05 Monotonic Trend ✅ Trend increasing Rows: 1-6 | Columns: 2The above tests gives us some more insights into the data such as that the data is monotonic, and is increasing. Furthermore, the low p-value confirms the presence of a trend with respect to time. Now we are sure of the trend so we can apply the appropriate time-series model to fit it.
Model Initialization¶
First we import the
VAR
model:from verticapy.machine_learning.vertica.tsa import VAR
Then we can create the model:
model = VAR(p = 2)
Hint
In
verticapy
1.0.x and higher, you do not need to specify the model name, as the name is automatically assigned. If you need to re-use the model, you can fetch the model name from the model’s attributes.Important
The model name is crucial for the model management system and versioning. It’s highly recommended to provide a name if you plan to reuse the model later.
Model Fitting¶
We can now fit the model:
model.fit(data, "month", ["GB1", "GB2"]) ========= phi_(t-1) ========= predictor| gb1 | gb2 ---------+--------+-------- gb1 | 2.15113|-3.77023 gb2 | 0.18040| 0.12940 ========= phi_(t-2) ========= predictor| gb1 | gb2 ---------+--------+-------- gb1 |-2.13660| 6.37890 gb2 |-0.27343| 1.00159 ==== mean ==== predictor| value ---------+-------- gb1 | 0.00000 gb2 | 0.00000 ================== mean_squared_error ================== predictor| value ---------+---------- gb1 |1234.12309 gb2 |280.45886 ================= predictor_columns ================= "gb1", "gb2" ================ timestamp_column ================ month ============== missing_method ============== error =========== call_string =========== autoregressor('"public"."_verticapy_tmp_ar_v_mldb_8d772ada55a511ef880f0242ac120002_"', '"public"."_verticapy_tmp_view_v_mldb_8d8f018c55a511ef880f0242ac120002_"', '"gb1", "gb2"', 'month' USING PARAMETERS p=2, method=yule-walker, missing=error, regularization='none', lambda=1, compute_mse=true, subtract_mean=false); =============== Additional Info =============== Name | Value ------------------+-------- lag_order | 2 num_predictors | 2 lambda | 1.00000 rejected_row_count| 0 accepted_row_count| 10
Important
To train a model, you can directly use the
vDataFrame
or the name of the relation stored in the database. The test set is optional and is only used to compute the test metrics. Inverticapy
, we don’t work usingX
matrices andy
vectors. Instead, we work directly with lists of predictors and the response name.Features Importance¶
We can conveniently get the features importance of the first predictor:
model.features_importance(idx=0) Out[6]: