verticapy.machine_learning.model_selection.statistical_tests.tsa.het_arch#
- verticapy.machine_learning.model_selection.statistical_tests.tsa.het_arch(input_relation: str | vDataFrame, eps: str, ts: str, by: str | list[str] | None = None, p: int = 1) tuple[float, float, float, float] #
Engle’s Test for Autoregressive Conditional Heteroscedasticity (ARCH).
Parameters#
- input_relation: SQLRelation
Input relation.
- eps: str
Input residual vDataColumn.
- ts: str
vDataColumn used as timeline to to order the data. It can be a numerical or date-like type (date, datetime, timestamp…) vDataColumn.
- by: SQLColumns, optional
vDataColumns used in the partition.
- p: int, optional
Number of lags to consider in the test.
Returns#
- tuple
Lagrange Multiplier statistic, LM pvalue, F statistic, F pvalue
Examples#
Initialization#
Let’s try this test on a dummy dataset that has the following elements:
A value of interest that has noise
Time-stamp data
Before we begin we can import the necessary libraries:
import verticapy as vp import numpy as np
Example 1: Random#
Now we can create the dummy dataset:
# Initialization N = 50 # Number of Rows. days = list(range(N)) vals = [np.random.normal(5) for x in days] # vDataFrame vdf = vp.vDataFrame( { "day": days, "eps": vals, } )
Let us plot the distribution of noise with respect to time:
vdf.scatter(["day", "eps"])
Test#
Now we can apply the Durbin Watson Test:
from verticapy.machine_learning.model_selection.statistical_tests import het_arch het_arch(input_relation = vdf, ts = "day", eps = "eps", p = 5) Out[8]: (5.6097146881908, 0.34606518349573423, 1.110826545721167, 0.3704722727198604)
We can see that there is no relationship with any lag except that which is by chance.
Now let us contrast it with another example where the lags are related:
Test#
Now we can apply the Durbin Watson Test:
from verticapy.machine_learning.model_selection.statistical_tests import het_arch het_arch(input_relation = vdf, ts = "day", eps = "eps", p = 5) Out[17]: (44.999964568754415, 1.4509012338760658e-08, 9906502.518229792, 6.717425199609984e-118)
We can see that the lags of multiple of 2 have a very low value of
p
. This confirms the presence of correaltion with certain lags.