verticapy.machine_learning.model_selection.statistical_tests.tsa.adfuller#
- verticapy.machine_learning.model_selection.statistical_tests.tsa.adfuller(input_relation: str | vDataFrame, column: str, ts: str, by: str | list[str] | None = None, p: int = 1, with_trend: bool = False, regresults: bool = False) TableSample #
Augmented Dickey Fuller test (Time Series stationarity).
Parameters#
- input_relation: SQLRelation
Input relation.
- column: str
Input vDataColumn to test.
- ts: str
vDataColumn used as timeline to order the data. It can be a numerical or type date like (date, datetime, timestamp…) vDataColumn.
- by: SQLColumns, optional
vDataColumns used in the partition.
- p: int, optional
Number of lags to consider in the test.
- with_trend: bool, optional
Adds a trend in the Regression.
- regresults: bool, optional
If True, the full regression results are returned.
Returns#
- TableSample
result of the test.
Examples#
Initialization#
Let’s try this test on a dummy dataset that has the following elements:
A value of interest
Time-stamp data
Before we begin we can import the necessary libraries:
import verticapy as vp
Example 1: Trend#
Now we can create the dummy dataset:
# Initialization N = 100 # Number of Rows. # vDataFrame vdf = vp.vDataFrame( { "year": list(range(N)), "X": [x + np.random.normal(0, 5) for x in range(N)], } )
We can visually inspect the trend by drawing the appropriate graph:
vdf["X"].plot(ts="year")
Though the increasing trend is obvious, we can test its
adfuller
score by first importing the function:from verticapy.machine_learning.model_selection.statistical_tests import adfuller
And then simply applying it on the
vDataFrame
:adfuller(vdf, column = "X", ts= "year") Out[5]: None value ADF Test Statistic -0.5486722291067921 p_value 0.584517599625156 # Lags used 1 # Observations Used 100 Critical Value (1%) -3.51 Critical Value (2.5%) -3.17 Critical Value (5%) -2.89 Critical Value (10%) -2.58 Stationarity (alpha = 1%) False Rows: 1-9 | Columns: 2
In the above context, the high p-value is evidence of lack of stationarity.
Note
A
p_value
in statistics represents the probability of obtaining results as extreme as, or more extreme than, the observed data, assuming the null hypothesis is true. A smaller p-value typically suggests stronger evidence against the null hypothesis i.e. the test data does not have a trend with respect to time in the current case.However, small is a relative term. And the choice for the threshold value which determines a “small” should be made before analyzing the data.
Generally a
p-value
less than 0.05 is considered the threshold to reject the null hypothesis. But it is not always the case - read moreExample 1: Stationary#
We can contrast the results with a dataset that has barely any trend:
vdf = vp.vDataFrame( { "year": list(range(N)), "X": [np.random.normal(0, 5) for x in range(N)], } )
We can visually inspect the absence of trend by drawing the appropriate graph:
vdf["X"].plot(ts="year")
Now we can perform the test on this dataset:
adfuller(vdf, column = "X", ts = "year") Out[7]: None value ADF Test Statistic -6.448767781308571 p_value 4.68221894172552e-09 # Lags used 1 # Observations Used 100 Critical Value (1%) -3.51 Critical Value (2.5%) -3.17 Critical Value (5%) -2.89 Critical Value (10%) -2.58 Stationarity (alpha = 1%) True Rows: 1-9 | Columns: 2
Note
Notice the low p-value which proves that there is stationarity.
For more information check out this link.