verticapy.machine_learning.model_selection.statistical_tests.tsa.mkt#
- verticapy.machine_learning.model_selection.statistical_tests.tsa.mkt(input_relation: str | vDataFrame, column: str, ts: str, alpha: int | float | Decimal = 0.05) TableSample #
Mann Kendall test (Time Series trend).
Warning
This Test is computationally expensive because it uses a CROSS JOIN during the computation. The complexity is O(n * k), n being the total count of the vDataFrame and k the number of rows to use to do the test.
Parameters#
- input_relation: SQLRelation
Input relation.
- column: str
Input vDataColumn to test.
- ts: str
vDataColumn used as timeline used to order the data. It can be a numerical or type date like (date, datetime, timestamp…) vDataColumn.
- alpha: PythonNumber, optional
Significance Level. Probability to accept H0.
Returns#
- TableSample
result of the test.
Examples#
Initialization#
Let’s try this test on a dummy dataset that has the following elements:
A value of interest
Time-stamp data
Before we begin we can import the necessary libraries:
import verticapy as vp
Example 1: Trend#
Now we can create the dummy dataset:
vdf = vp.vDataFrame( { "X": [0, 1, 2, 3, 4, 5, 6], "year": [1990, 1991, 1992, 1993, 1994, 1995, 1996], } )
We can visually inspect the trend by drawing the appropriate graph:
vdf["X"].plot(ts="year")
Though the increasing trend is obvious, we can test its
mkt
score by first importing the function:from verticapy.machine_learning.model_selection.statistical_tests import mkt
And then simply applying it on the
vDataFrame
:mkt(vdf, column = "X", ts= "year") Out[4]: None value Mann Kendall Test Statistic 3.0037570459305547 S 21.0 STDS 6.65832811847939 p_value 0.002666681793614077 Monotonic Trend True Trend increasing Rows: 1-6 | Columns: 2
In the above context, the low p-value is evidence of the presence of trend. The function also gives us information about the nature of trend. In this case, we can see that it is a monotonically increasing trend which conforms with our plot that we observed above.
Note
A
p_value
in statistics represents the probability of obtaining results as extreme as, or more extreme than, the observed data, assuming the null hypothesis is true. A smaller p-value typically suggests stronger evidence against the null hypothesis i.e. the test data does not have a trend with respect to time in the current case.However, small is a relative term. And the choice for the threshold value which determines a “small” should be made before analyzing the data.
Generally a
p-value
less than 0.05 is considered the threshold to reject the null hypothesis. But it is not always the case - read moreExample 1: No Trend#
We can contrast the results with a dataset that has barely any trend:
vdf = vp.vDataFrame( { "X":[1, 1, 1, 1, 1, 1, 1], "year": [1990, 1991, 1992, 1993, 1994, 1995, 1996], } )
We can visually inspect the absence of trend by drawing the appropriate graph:
vdf["X"].plot(ts="year")
Now we can perform the test on this dataset:
mkt(vdf, column = "X", ts = "year") Out[6]: None value Mann Kendall Test Statistic 0 S 0.0 STDS nan p_value 1.0 Monotonic Trend False Trend no trend Rows: 1-6 | Columns: 2
Note
Notice the extreme p-value which is significant to disprove the null hypothesis.
For more information check out this link.