Loading...

verticapy.machine_learning.model_selection.statistical_tests.tsa.adfuller#

verticapy.machine_learning.model_selection.statistical_tests.tsa.adfuller(input_relation: str | vDataFrame, column: str, ts: str, by: str | list[str] | None = None, p: int = 1, with_trend: bool = False, regresults: bool = False) TableSample#

Augmented Dickey Fuller test (Time Series stationarity).

Parameters#

input_relation: SQLRelation

Input relation.

column: str

Input vDataColumn to test.

ts: str

vDataColumn used as timeline to order the data. It can be a numerical or type date like (date, datetime, timestamp…) vDataColumn.

by: SQLColumns, optional

vDataColumns used in the partition.

p: int, optional

Number of lags to consider in the test.

with_trend: bool, optional

Adds a trend in the Regression.

regresults: bool, optional

If True, the full regression results are returned.

Returns#

TableSample

result of the test.

Examples#

Initialization#

Let’s try this test on a dummy dataset that has the following elements:

  • A value of interest

  • Time-stamp data

Before we begin we can import the necessary libraries:

import verticapy as vp

Example 1: Trend#

Now we can create the dummy dataset:

# Initialization
N = 100 # Number of Rows.

# vDataFrame
vdf = vp.vDataFrame(
    {
        "year": list(range(N)),
        "X": [x + np.random.normal(0, 5) for x in range(N)],
    }
)

We can visually inspect the trend by drawing the appropriate graph:

vdf["X"].plot(ts="year")

Though the increasing trend is obvious, we can test its adfuller score by first importing the function:

from verticapy.machine_learning.model_selection.statistical_tests import adfuller

And then simply applying it on the vDataFrame:

adfuller(vdf, column = "X", ts= "year")
Out[5]: 
None                                         value  
ADF Test Statistic             -0.5486722291067921  
p_value                          0.584517599625156  
# Lags used                                      1  
# Observations Used                            100  
Critical Value (1%)                          -3.51  
Critical Value (2.5%)                        -3.17  
Critical Value (5%)                          -2.89  
Critical Value (10%)                         -2.58  
Stationarity (alpha = 1%)                    False  
Rows: 1-9 | Columns: 2

In the above context, the high p-value is evidence of lack of stationarity.

Note

A p_value in statistics represents the probability of obtaining results as extreme as, or more extreme than, the observed data, assuming the null hypothesis is true. A smaller p-value typically suggests stronger evidence against the null hypothesis i.e. the test data does not have a trend with respect to time in the current case.

However, small is a relative term. And the choice for the threshold value which determines a “small” should be made before analyzing the data.

Generally a p-value less than 0.05 is considered the threshold to reject the null hypothesis. But it is not always the case - read more

Example 1: Stationary#

We can contrast the results with a dataset that has barely any trend:

vdf = vp.vDataFrame(
    {
        "year": list(range(N)),
        "X": [np.random.normal(0, 5) for x in range(N)],
    }
)

We can visually inspect the absence of trend by drawing the appropriate graph:

vdf["X"].plot(ts="year")

Now we can perform the test on this dataset:

adfuller(vdf, column = "X", ts = "year")
Out[7]: 
None                                          value  
ADF Test Statistic               -6.448767781308571  
p_value                        4.68221894172552e-09  
# Lags used                                       1  
# Observations Used                             100  
Critical Value (1%)                           -3.51  
Critical Value (2.5%)                         -3.17  
Critical Value (5%)                           -2.89  
Critical Value (10%)                          -2.58  
Stationarity (alpha = 1%)                      True  
Rows: 1-9 | Columns: 2

Note

Notice the low p-value which proves that there is stationarity.

For more information check out this link.