Loading...

verticapy.machine_learning.model_selection.statistical_tests.tsa.het_arch

verticapy.machine_learning.model_selection.statistical_tests.tsa.het_arch(input_relation: Annotated[str | vDataFrame, ''], eps: str, ts: str, by: Annotated[str | list[str], 'STRING representing one column or a list of columns'] | None = None, p: int = 1) tuple[float, float, float, float]

Engle’s Test for Autoregressive Conditional Heteroscedasticity (ARCH).

Parameters

input_relation: SQLRelation

Input relation.

eps: str

Input residual vDataColumn.

ts: str

vDataColumn used as timeline to to order the data. It can be a numerical or date-like type (date, datetime, timestamp…) vDataColumn.

by: SQLColumns, optional

vDataColumns used in the partition.

p: int, optional

Number of lags to consider in the test.

Returns

tuple

Lagrange Multiplier statistic, LM pvalue, F statistic, F pvalue

Examples

Initialization

Let’s try this test on a dummy dataset that has the following elements:

  • A value of interest that has noise

  • Time-stamp data

Before we begin we can import the necessary libraries:

import verticapy as vp

import numpy as np

Example 1: Random

Now we can create the dummy dataset:

# Initialization
N = 50 # Number of Rows.

days = list(range(N))

vals = [np.random.normal(5) for x in days]

# vDataFrame
vdf = vp.vDataFrame(
    {
        "day": days,
        "eps": vals,
    }
)

Let us plot the distribution of noise with respect to time:

vdf.scatter(["day", "eps"])

Test

Now we can apply the Durbin Watson Test:

from verticapy.machine_learning.model_selection.statistical_tests import het_arch

het_arch(input_relation = vdf, ts = "day", eps = "eps", p = 5)
Out[8]: (5.792308772293665, 0.3269556338536383, 1.1523251435923334, 0.3497782532314971)

We can see that there is no relationship with any lag except that which is by chance.

Now let us contrast it with another example where the lags are related:

Example 1: Correlated

We can create an alternate dataset that exhibits some correlation with a specific lag. Below, we intertwine two separate values, one after the other, thereby creating a new value. This new value has the characteristic that every other value is related to the one that is two steps before it, but not to the one immediately before it

# Initialization
N = 50 # Number of Rows

days = list(range(N))

x1 = [2 * -x for x in list(range(40, 40 + 5 * N, 5))]

x2 = [-2 * -x * x * x / 2 for x in list(range(4, 4 + 2 * N, 2))]

vals = []

for elem_1, elem_2 in zip(x1, x2):
    vals.extend([elem_1, elem_2])


# vDataFrame
vdf = vp.vDataFrame(
    {
        "day": days,
        "eps": vals,
    }
)

Let us plot the distribution of noise with respect to time to observe the trend:

vdf.scatter(["day", "eps"])