verticapy.machine_learning.model_selection.statistical_tests.tsa.ljungbox#
- verticapy.machine_learning.model_selection.statistical_tests.tsa.ljungbox(input_relation: str | vDataFrame, column: str, ts: str, by: str | list[str] | None = None, p: int = 1, alpha: int | float | Decimal = 0.05, box_pierce: bool = False) TableSample #
Ljung–Box test (whether any of a group of autocorrelations of a time series are different from zero).
Parameters#
- input_relation: SQLRelation
Input relation.
- column: str
Input vDataColumn to test.
- ts: str
vDataColumn used as timeline to order the data. It can be a numerical or date-like type (date, datetime, timestamp…) vDataColumn.
- by: SQLColumns, optional
vDataColumns used in the partition.
- p: int, optional
Number of lags to consider in the test.
- alpha: PythonNumber, optional
Significance Level. Probability to accept H0.
- box_pierce: bool
If set to True, the Box-Pierce statistic is used.
Returns#
- TableSample
result of the test.
Examples#
Initialization#
Let’s try this test on a dummy dataset that has the following elements:
Time-stamp data
Some columns related to time
Some columns independent of time
Before we begin we can import the necessary libraries:
import verticapy as vp import numpy as np
Data#
Now we can create the dummy dataset:
# Initialization N = 50 # Number of Rows. day = list(range(N)) x_val_1 = [2 * x + np.random.normal(scale = 4) for x in day] x_val_2 = np.random.normal(0, 4, N) # vDataFrame vdf = vp.vDataFrame( { "day": day, "x1": x_val_1, "x2": x_val_2, } )
Note that in the above dataset we have create two columns
x1
andx2
.x1
:It is related to
day
x2
:It is independent of
day
Data Visualization#
We can visualize ther relationship with the help of a scatter plot:
vdf.scatter(["day", "x1"])
We can see that the variable
x1
seems to be correalted with time. Now let us check the other variablex2
.vdf.scatter(["day", "x2"])
Above we observe that there is no apparent correlation with time.
Test#
Now we can apply the Ljung-Box test Test:
from verticapy.machine_learning.model_selection.statistical_tests import ljungbox ljungbox(vdf, "x1", ts = "day") Out[9]: None Ljung–Box Test Statistic p_value Serial Correlation 1 50.95736003367223 9.4394402921354e-13 True Rows: 1-1 | Columns: 4
The test confirms that there is indeed a relationship.
Now, we can test the other independent column as well:
from verticapy.machine_learning.model_selection.statistical_tests import ljungbox ljungbox(vdf, "x2", ts = "day") Out[11]: None Ljung–Box Test Statistic p_value Serial Correlation 1 1.3766782863902172 0.2406677672896569 False Rows: 1-1 | Columns: 4
We can confirm that
x2
is indeed independent of time. The results are consistent with our earlier visual observation.