verticapy.machine_learning.model_selection.statistical_tests.tsa.durbin_watson#

verticapy.machine_learning.model_selection.statistical_tests.tsa.durbin_watson(input_relation: str | vDataFrame, eps: str, ts: str, by: str | list[str] | None = None) → float#

Durbin Watson test (residuals autocorrelation).

Parameters#

input_relation: SQLRelation: Input relation.
eps: str: Input residual vDataColumn.
ts: str: vDataColumn used as timeline to order the data. It can be a numerical or date-like type (date, datetime, timestamp…) vDataColumn.
by: SQLColumns, optional: vDataColumns used in the partition.

Returns#

float: Durbin Watson statistic.

Examples#

Initialization#

Let’s try this test on a dummy dataset that has the following elements:

A value of interest that has noise related to time
Time-stamp data

Before we begin we can import the necessary libraries:

import verticapy as vp

import numpy as np

Data#

Now we can create the dummy dataset:

# Initialization
N = 50 # Number of Rows

days = list(range(N))

y_val = [2 * x + np.random.normal(scale = 4 * x * x) for x in days]

# vDataFrame
vdf = vp.vDataFrame(
    {
        "day": days,
        "y1": y_val,
    }
)

Model Fitting#

Next, we can fit a Linear Model. To do that we need to first import the model and intialize:

from verticapy.machine_learning.vertica.linear_model import LinearRegression

model = LinearRegression()

Next we can fit the model:

model.fit(vdf, X = "day", y = "y1")

We can create a column in the vDataFrame that has the predictions:

model.predict(vdf, X = "day", name = "y_pred")

	123 day Integer	123 y1 Numeric(24)	123 y_pred Float(22)
1	0	0.0	695.570784478391
2	1	0.09848742416711298	670.887773632021
3	2	0.03912258217056008	646.204762785651
4	3	85.37352973986928	621.521751939281
5	4	-48.75388042661133	596.838741092911
6	5	126.0858711752709	572.15573024654
7	6	51.69424419556892	547.47271940017
8	7	104.32766314828993	522.7897085538
9	8	227.2754345105607	498.10669770743
10	9	-164.7427813106934	473.423686861059
11	10	-538.785705257756	448.740676014689
12	11	724.2384946888296	424.057665168319
13	12	-26.415579560172006	399.374654321949
14	13	-756.3588086370179	374.691643475578
15	14	-126.99777738070188	350.008632629208
16	15	296.19324856386186	325.325621782838
17	16	-288.23828109099827	300.642610936468
18	17	1080.1901345665055	275.959600090098
19	18	-354.3608974069067	251.276589243727
20	19	1142.4684257271779	226.593578397357
21	20	839.8004882674298	201.910567550987
22	21	-469.7849131489231	177.227556704617
23	22	1859.3517617422665	152.544545858246
24	23	1908.5345703819664	127.861535011876
25	24	441.53473802092327	103.178524165506
26	25	-1796.9659016561104	78.4955133191357
27	26	1017.2978698860812	53.8125024727655
28	27	3469.951265085289	29.1294916263953
29	28	-2011.112758561862	4.44648078002501
30	29	-2760.2928530032145	-20.2365300663452
31	30	-2804.033839222626	-44.9195409127154
32	31	-767.3266685995795	-69.6025517590856
33	32	7004.313887113298	-94.2855626054559
34	33	5904.209305639675	-118.968573451826
35	34	257.1561492350385	-143.651584298196
36	35	4290.517109749369	-168.334595144567
37	36	-9834.812780626718	-193.017605990937
38	37	7987.671065039417	-217.700616837307
39	38	6950.042952942006	-242.383627683677
40	39	3839.0640588359743	-267.066638530047
41	40	993.4032928205398	-291.749649376418
42	41	281.604712575857	-316.432660222788
43	42	-8871.739251235049	-341.115671069158
44	43	-2195.3831253193907	-365.798681915529
45	44	10523.18777349974	-390.481692761899
46	45	-5535.408439575653	-415.164703608269
47	46	-4807.7895048210885	-439.847714454639
48	47	-3857.397377262891	-464.530725301009
49	48	689.6864965553993	-489.21373614738
50	49	-9536.76009249254	-513.89674699375

Rows: 1-50 | Columns: 3

Then we can calculate the residuals i.e. eps:

vdf["eps"] = vdf["y1"] - vdf["y_pred"]

We can plot the residuals to see the trend:

vdf.scatter(["day", "eps"])

Test#

Now we can apply the Durbin Watson Test:

from verticapy.machine_learning.model_selection.statistical_tests import durbin_watson

durbin_watson(input_relation = vdf, ts = "day", eps = "eps")
Out[12]: 1.92635844701162

We can see that the Durbin-Watson statistic is not equal to 2. This shows the presence of autocorrelation.

Note

The Durbin-Watson statistic values can be interpretted as such:

Approximately 2: No significant autocorrelation.

Less than 2: Positive autocorrelation (residuals are correlated positively with their lagged values).

Greater than 2: Negative autocorrelation (residuals are correlated negatively with their lagged values).