
verticapy.machine_learning.model_selection.statistical_tests.tsa.durbin_watson¶
- verticapy.machine_learning.model_selection.statistical_tests.tsa.durbin_watson(input_relation: Annotated[str | vDataFrame, ''], eps: str, ts: str, by: Annotated[str | list[str], 'STRING representing one column or a list of columns'] | None = None) float ¶
Durbin Watson test (residuals autocorrelation).
Parameters¶
- input_relation: SQLRelation
Input relation.
- eps: str
Input residual vDataColumn.
- ts: str
vDataColumn used as timeline to order the data. It can be a numerical or date-like type (date, datetime, timestamp…) vDataColumn.
- by: SQLColumns, optional
vDataColumns used in the partition.
Returns¶
- float
Durbin Watson statistic.
Examples¶
Initialization¶
Let’s try this test on a dummy dataset that has the following elements:
A value of interest that has noise related to time
Time-stamp data
Before we begin we can import the necessary libraries:
import verticapy as vp import numpy as np
Data¶
Now we can create the dummy dataset:
# Initialization N = 50 # Number of Rows days = list(range(N)) y_val = [2 * x + np.random.normal(scale = 4 * x * x) for x in days] # vDataFrame vdf = vp.vDataFrame( { "day": days, "y1": y_val, } )
Model Fitting¶
Next, we can fit a Linear Model. To do that we need to first import the model and intialize:
from verticapy.machine_learning.vertica.linear_model import LinearRegression model = LinearRegression()
Next we can fit the model:
model.fit(vdf, X = "day", y = "y1") ======= details ======= predictor|coefficient| std_err |t_value |p_value ---------+-----------+---------+--------+-------- Intercept|-1059.96780|849.05804|-1.24840| 0.21794 day | 107.60489 |29.86052 | 3.60358| 0.00074 ============== regularization ============== type| lambda ----+-------- none| 1.00000 =========== call_string =========== linear_reg('"public"."_verticapy_tmp_linearregression_v_demo_21766c8a55a311ef880f0242ac120002_"', '"public"."_verticapy_tmp_view_v_demo_218560fa55a311ef880f0242ac120002_"', '"y1"', '"day"' USING PARAMETERS optimizer='newton', epsilon=1e-06, max_iterations=100, regularization='none', lambda=1, alpha=0.5, fit_intercept=true) =============== Additional Info =============== Name |Value ------------------+----- iteration_count | 1 rejected_row_count| 0 accepted_row_count| 50
We can create a column in the
vDataFrame
that has the predictions:model.predict(vdf, X = "day", name = "y_pred")
123day123y1123y_pred1 0 0.0 -1059.96779943289 2 1 3.0323024307400877 -952.36290617722 3 2 2.6262768227060183 -844.758012921546 4 3 38.70522660695966 -737.153119665873 5 4 95.15498880502142 -629.5482264102 6 5 -3.4002681648928874 -521.943333154526 7 6 92.84930903581254 -414.338439898853 8 7 87.59586637070083 -306.73354664318 9 8 278.04629117969984 -199.128653387507 10 9 -339.2998480761343 -91.5237601318332 11 10 -154.61873856958653 16.0811331238401 12 11 -544.6677594658147 123.686026379513 13 12 -419.8403811208115 231.290919635187 14 13 -138.45993379271926 338.89581289086 15 14 -147.16892856243376 446.500706146533 16 15 -905.7992030534032 554.105599402207 17 16 -1367.8142783247429 661.71049265788 18 17 -6.780636118270287 769.315385913553 19 18 1188.4731411340285 876.920279169227 20 19 2847.1974067405863 984.5251724249 21 20 77.48957479353845 1092.13006568057 22 21 -480.3534746429739 1199.73495893625 23 22 1668.5094820426525 1307.33985219192 24 23 49.917251791498636 1414.94474544759 25 24 -521.428138556028 1522.54963870327 26 25 3809.498008865541 1630.15453195894 27 26 224.3413546408896 1737.75942521461 28 27 3015.8922722645966 1845.36431847029 29 28 -632.3270588753575 1952.96921172596 30 29 5378.877849766428 2060.57410498163 31 30 -1935.3580752545174 2168.17899823731 32 31 3089.2525112006983 2275.78389149298 33 32 -77.57083087682102 2383.38878474865 34 33 11431.11361541808 2490.99367800433 35 34 -1023.1390245811945 2598.59857126 36 35 -1694.6056944183701 2706.20346451567 37 36 -319.9406582680684 2813.80835777135 38 37 7379.577647536077 2921.41325102702 39 38 8311.642496243374 3029.01814428269 40 39 8395.607439134075 3136.62303753837 41 40 3350.5622930902346 3244.22793079404 42 41 3232.6335568942727 3351.83282404971 43 42 -725.3914122801724 3459.43771730539 44 43 4832.302821875597 3567.04261056106 45 44 -5444.245643341212 3674.64750381673 46 45 2112.9252816410417 3782.25239707241 47 46 11392.913237846042 3889.85729032808 48 47 1774.237766727445 3997.46218358375 49 48 5847.265558608507 4105.06707683943 50 49 5691.5734233918065 4212.6719700951 Rows: 1-50 | Columns: 3Then we can calculate the residuals i.e.
eps
:vdf["eps"] = vdf["y1"] - vdf["y_pred"]
We can plot the residuals to see the trend:
vdf.scatter(["day", "eps"])
Test¶
Now we can apply the Durbin Watson Test:
from verticapy.machine_learning.model_selection.statistical_tests import durbin_watson durbin_watson(input_relation = vdf, ts = "day", eps = "eps") Out[12]: 2.19927329804118
We can see that the Durbin-Watson statistic is not equal to 2. This shows the presence of autocorrelation.
Note
The Durbin-Watson statistic values can be interpretted as such:
Approximately 2: No significant autocorrelation.
Less than 2: Positive autocorrelation (residuals are correlated positively with their lagged values).
Greater than 2: Negative autocorrelation (residuals are correlated negatively with their lagged values).