verticapy.machine_learning.model_selection.statistical_tests.tsa.durbin_watson#
- verticapy.machine_learning.model_selection.statistical_tests.tsa.durbin_watson(input_relation: str | vDataFrame, eps: str, ts: str, by: str | list[str] | None = None) float #
Durbin Watson test (residuals autocorrelation).
Parameters#
- input_relation: SQLRelation
Input relation.
- eps: str
Input residual vDataColumn.
- ts: str
vDataColumn used as timeline to order the data. It can be a numerical or date-like type (date, datetime, timestamp…) vDataColumn.
- by: SQLColumns, optional
vDataColumns used in the partition.
Returns#
- float
Durbin Watson statistic.
Examples#
Initialization#
Let’s try this test on a dummy dataset that has the following elements:
A value of interest that has noise related to time
Time-stamp data
Before we begin we can import the necessary libraries:
import verticapy as vp import numpy as np
Data#
Now we can create the dummy dataset:
# Initialization N = 50 # Number of Rows days = list(range(N)) y_val = [2 * x + np.random.normal(scale = 4 * x * x) for x in days] # vDataFrame vdf = vp.vDataFrame( { "day": days, "y1": y_val, } )
Model Fitting#
Next, we can fit a Linear Model. To do that we need to first import the model and intialize:
from verticapy.machine_learning.vertica.linear_model import LinearRegression model = LinearRegression()
Next we can fit the model:
model.fit(vdf, X = "day", y = "y1")
We can create a column in the
vDataFrame
that has the predictions:model.predict(vdf, X = "day", name = "y_pred")
123dayInteger123y1Numeric(24)123y_predFloat(22)1 0 0.0 695.570784478391 2 1 0.09848742416711298 670.887773632021 3 2 0.03912258217056008 646.204762785651 4 3 85.37352973986928 621.521751939281 5 4 -48.75388042661133 596.838741092911 6 5 126.0858711752709 572.15573024654 7 6 51.69424419556892 547.47271940017 8 7 104.32766314828993 522.7897085538 9 8 227.2754345105607 498.10669770743 10 9 -164.7427813106934 473.423686861059 11 10 -538.785705257756 448.740676014689 12 11 724.2384946888296 424.057665168319 13 12 -26.415579560172006 399.374654321949 14 13 -756.3588086370179 374.691643475578 15 14 -126.99777738070188 350.008632629208 16 15 296.19324856386186 325.325621782838 17 16 -288.23828109099827 300.642610936468 18 17 1080.1901345665055 275.959600090098 19 18 -354.3608974069067 251.276589243727 20 19 1142.4684257271779 226.593578397357 21 20 839.8004882674298 201.910567550987 22 21 -469.7849131489231 177.227556704617 23 22 1859.3517617422665 152.544545858246 24 23 1908.5345703819664 127.861535011876 25 24 441.53473802092327 103.178524165506 26 25 -1796.9659016561104 78.4955133191357 27 26 1017.2978698860812 53.8125024727655 28 27 3469.951265085289 29.1294916263953 29 28 -2011.112758561862 4.44648078002501 30 29 -2760.2928530032145 -20.2365300663452 31 30 -2804.033839222626 -44.9195409127154 32 31 -767.3266685995795 -69.6025517590856 33 32 7004.313887113298 -94.2855626054559 34 33 5904.209305639675 -118.968573451826 35 34 257.1561492350385 -143.651584298196 36 35 4290.517109749369 -168.334595144567 37 36 -9834.812780626718 -193.017605990937 38 37 7987.671065039417 -217.700616837307 39 38 6950.042952942006 -242.383627683677 40 39 3839.0640588359743 -267.066638530047 41 40 993.4032928205398 -291.749649376418 42 41 281.604712575857 -316.432660222788 43 42 -8871.739251235049 -341.115671069158 44 43 -2195.3831253193907 -365.798681915529 45 44 10523.18777349974 -390.481692761899 46 45 -5535.408439575653 -415.164703608269 47 46 -4807.7895048210885 -439.847714454639 48 47 -3857.397377262891 -464.530725301009 49 48 689.6864965553993 -489.21373614738 50 49 -9536.76009249254 -513.89674699375 Rows: 1-50 | Columns: 3Then we can calculate the residuals i.e.
eps
:vdf["eps"] = vdf["y1"] - vdf["y_pred"]
We can plot the residuals to see the trend:
vdf.scatter(["day", "eps"])
Test#
Now we can apply the Durbin Watson Test:
from verticapy.machine_learning.model_selection.statistical_tests import durbin_watson durbin_watson(input_relation = vdf, ts = "day", eps = "eps") Out[12]: 1.92635844701162
We can see that the Durbin-Watson statistic is not equal to 2. This shows the presence of autocorrelation.
Note
The Durbin-Watson statistic values can be interpretted as such:
Approximately 2: No significant autocorrelation.
Less than 2: Positive autocorrelation (residuals are correlated positively with their lagged values).
Greater than 2: Negative autocorrelation (residuals are correlated negatively with their lagged values).