Loading...

verticapy.machine_learning.model_selection.statistical_tests.tsa.mkt#

verticapy.machine_learning.model_selection.statistical_tests.tsa.mkt(input_relation: str | vDataFrame, column: str, ts: str, alpha: int | float | Decimal = 0.05) TableSample#

Mann Kendall test (Time Series trend).

Warning

This Test is computationally expensive because it uses a CROSS JOIN during the computation. The complexity is O(n * k), n being the total count of the vDataFrame and k the number of rows to use to do the test.

Parameters#

input_relation: SQLRelation

Input relation.

column: str

Input vDataColumn to test.

ts: str

vDataColumn used as timeline used to order the data. It can be a numerical or type date like (date, datetime, timestamp…) vDataColumn.

alpha: PythonNumber, optional

Significance Level. Probability to accept H0.

Returns#

TableSample

result of the test.

Examples#

Initialization#

Let’s try this test on a dummy dataset that has the following elements:

  • A value of interest

  • Time-stamp data

Before we begin we can import the necessary libraries:

import verticapy as vp

Example 1: Trend#

Now we can create the dummy dataset:

vdf = vp.vDataFrame(
    {
        "X": [0, 1, 2, 3, 4, 5, 6],
        "year": [1990, 1991, 1992, 1993, 1994, 1995, 1996],
    }
)

We can visually inspect the trend by drawing the appropriate graph:

vdf["X"].plot(ts="year")

Though the increasing trend is obvious, we can test its mkt score by first importing the function:

from verticapy.machine_learning.model_selection.statistical_tests import mkt

And then simply applying it on the vDataFrame:

mkt(vdf, column = "X", ts= "year")
Out[4]: 
None                                            value  
Mann Kendall Test Statistic        3.0037570459305547  
S                                                21.0  
STDS                                 6.65832811847939  
p_value                          0.002666681793614077  
Monotonic Trend                                  True  
Trend                                      increasing  
Rows: 1-6 | Columns: 2

In the above context, the low p-value is evidence of the presence of trend. The function also gives us information about the nature of trend. In this case, we can see that it is a monotonically increasing trend which conforms with our plot that we observed above.

Note

A p_value in statistics represents the probability of obtaining results as extreme as, or more extreme than, the observed data, assuming the null hypothesis is true. A smaller p-value typically suggests stronger evidence against the null hypothesis i.e. the test data does not have a trend with respect to time in the current case.

However, small is a relative term. And the choice for the threshold value which determines a “small” should be made before analyzing the data.

Generally a p-value less than 0.05 is considered the threshold to reject the null hypothesis. But it is not always the case - read more

Example 1: No Trend#

We can contrast the results with a dataset that has barely any trend:

vdf = vp.vDataFrame(
    {
        "X":[1, 1, 1, 1, 1, 1, 1],
        "year": [1990, 1991, 1992, 1993, 1994, 1995, 1996],
    }
)

We can visually inspect the absence of trend by drawing the appropriate graph:

vdf["X"].plot(ts="year")

Now we can perform the test on this dataset:

mkt(vdf, column = "X", ts = "year")
Out[6]: 
None                                value  
Mann Kendall Test Statistic             0  
S                                     0.0  
STDS                                  nan  
p_value                               1.0  
Monotonic Trend                     False  
Trend                            no trend  
Rows: 1-6 | Columns: 2

Note

Notice the extreme p-value which is significant to disprove the null hypothesis.

For more information check out this link.