Loading...

verticapy.machine_learning.model_selection.statistical_tests.tsa.ljungbox#

verticapy.machine_learning.model_selection.statistical_tests.tsa.ljungbox(input_relation: str | vDataFrame, column: str, ts: str, by: str | list[str] | None = None, p: int = 1, alpha: int | float | Decimal = 0.05, box_pierce: bool = False) TableSample#

Ljung–Box test (whether any of a group of autocorrelations of a time series are different from zero).

Parameters#

input_relation: SQLRelation

Input relation.

column: str

Input vDataColumn to test.

ts: str

vDataColumn used as timeline to order the data. It can be a numerical or date-like type (date, datetime, timestamp…) vDataColumn.

by: SQLColumns, optional

vDataColumns used in the partition.

p: int, optional

Number of lags to consider in the test.

alpha: PythonNumber, optional

Significance Level. Probability to accept H0.

box_pierce: bool

If set to True, the Box-Pierce statistic is used.

Returns#

TableSample

result of the test.

Examples#

Initialization#

Let’s try this test on a dummy dataset that has the following elements:

  • Time-stamp data

  • Some columns related to time

  • Some columns independent of time

Before we begin we can import the necessary libraries:

import verticapy as vp

import numpy as np

Data#

Now we can create the dummy dataset:

# Initialization
N = 50 # Number of Rows.

day = list(range(N))

x_val_1 = [2 * x + np.random.normal(scale = 4) for x in day]

x_val_2 = np.random.normal(0, 4, N)

# vDataFrame
vdf = vp.vDataFrame(
    {
        "day": day,
        "x1": x_val_1,
        "x2": x_val_2,
    }
)

Note that in the above dataset we have create two columns x1 and x2.

  • x1:

    It is related to day

  • x2:

    It is independent of day

Data Visualization#

We can visualize ther relationship with the help of a scatter plot:

vdf.scatter(["day", "x1"])

We can see that the variable x1 seems to be correalted with time. Now let us check the other variable x2.

vdf.scatter(["day", "x2"])

Above we observe that there is no apparent correlation with time.

Test#

Now we can apply the Ljung-Box test Test:

from verticapy.machine_learning.model_selection.statistical_tests import ljungbox

ljungbox(vdf, "x1", ts = "day")
Out[9]: 
None       Ljung–Box Test Statistic                p_value    Serial Correlation  
1                 50.95736003367223    9.4394402921354e-13                  True  
Rows: 1-1 | Columns: 4

The test confirms that there is indeed a relationship.

Now, we can test the other independent column as well:

from verticapy.machine_learning.model_selection.statistical_tests import ljungbox

ljungbox(vdf, "x2", ts = "day")
Out[11]: 
None       Ljung–Box Test Statistic               p_value    Serial Correlation  
1                1.3766782863902172    0.2406677672896569                 False  
Rows: 1-1 | Columns: 4

We can confirm that x2 is indeed independent of time. The results are consistent with our earlier visual observation.