Loading...

verticapy.machine_learning.model_selection.statistical_tests.tsa.durbin_watson#

verticapy.machine_learning.model_selection.statistical_tests.tsa.durbin_watson(input_relation: str | vDataFrame, eps: str, ts: str, by: str | list[str] | None = None) float#

Durbin Watson test (residuals autocorrelation).

Parameters#

input_relation: SQLRelation

Input relation.

eps: str

Input residual vDataColumn.

ts: str

vDataColumn used as timeline to order the data. It can be a numerical or date-like type (date, datetime, timestamp…) vDataColumn.

by: SQLColumns, optional

vDataColumns used in the partition.

Returns#

float

Durbin Watson statistic.

Examples#

Initialization#

Let’s try this test on a dummy dataset that has the following elements:

  • A value of interest that has noise related to time

  • Time-stamp data

Before we begin we can import the necessary libraries:

import verticapy as vp

import numpy as np

Data#

Now we can create the dummy dataset:

# Initialization
N = 50 # Number of Rows

days = list(range(N))

y_val = [2 * x + np.random.normal(scale = 4 * x * x) for x in days]

# vDataFrame
vdf = vp.vDataFrame(
    {
        "day": days,
        "y1": y_val,
    }
)

Model Fitting#

Next, we can fit a Linear Model. To do that we need to first import the model and intialize:

from verticapy.machine_learning.vertica.linear_model import LinearRegression

model = LinearRegression()

Next we can fit the model:

model.fit(vdf, X = "day", y = "y1")

We can create a column in the vDataFrame that has the predictions:

model.predict(vdf, X = "day", name = "y_pred")
123
day
Integer
123
y1
Numeric(24)
123
y_pred
Float(22)
100.0695.570784478391
210.09848742416711298670.887773632021
320.03912258217056008646.204762785651
4385.37352973986928621.521751939281
54-48.75388042661133596.838741092911
65126.0858711752709572.15573024654
7651.69424419556892547.47271940017
87104.32766314828993522.7897085538
98227.2754345105607498.10669770743
109-164.7427813106934473.423686861059
1110-538.785705257756448.740676014689
1211724.2384946888296424.057665168319
1312-26.415579560172006399.374654321949
1413-756.3588086370179374.691643475578
1514-126.99777738070188350.008632629208
1615296.19324856386186325.325621782838
1716-288.23828109099827300.642610936468
18171080.1901345665055275.959600090098
1918-354.3608974069067251.276589243727
20191142.4684257271779226.593578397357
2120839.8004882674298201.910567550987
2221-469.7849131489231177.227556704617
23221859.3517617422665152.544545858246
24231908.5345703819664127.861535011876
2524441.53473802092327103.178524165506
2625-1796.965901656110478.4955133191357
27261017.297869886081253.8125024727655
28273469.95126508528929.1294916263953
2928-2011.1127585618624.44648078002501
3029-2760.2928530032145-20.2365300663452
3130-2804.033839222626-44.9195409127154
3231-767.3266685995795-69.6025517590856
33327004.313887113298-94.2855626054559
34335904.209305639675-118.968573451826
3534257.1561492350385-143.651584298196
36354290.517109749369-168.334595144567
3736-9834.812780626718-193.017605990937
38377987.671065039417-217.700616837307
39386950.042952942006-242.383627683677
40393839.0640588359743-267.066638530047
4140993.4032928205398-291.749649376418
4241281.604712575857-316.432660222788
4342-8871.739251235049-341.115671069158
4443-2195.3831253193907-365.798681915529
454410523.18777349974-390.481692761899
4645-5535.408439575653-415.164703608269
4746-4807.7895048210885-439.847714454639
4847-3857.397377262891-464.530725301009
4948689.6864965553993-489.21373614738
5049-9536.76009249254-513.89674699375
Rows: 1-50 | Columns: 3

Then we can calculate the residuals i.e. eps:

vdf["eps"] = vdf["y1"] - vdf["y_pred"]

We can plot the residuals to see the trend:

vdf.scatter(["day", "eps"])

Test#

Now we can apply the Durbin Watson Test:

from verticapy.machine_learning.model_selection.statistical_tests import durbin_watson

durbin_watson(input_relation = vdf, ts = "day", eps = "eps")
Out[12]: 1.92635844701162

We can see that the Durbin-Watson statistic is not equal to 2. This shows the presence of autocorrelation.

Note

The Durbin-Watson statistic values can be interpretted as such:

Approximately 2: No significant autocorrelation.

Less than 2: Positive autocorrelation (residuals are correlated positively with their lagged values).

Greater than 2: Negative autocorrelation (residuals are correlated negatively with their lagged values).