verticapy.machine_learning.model_selection.statistical_tests.ols.variance_inflation_factor#
- verticapy.machine_learning.model_selection.statistical_tests.ols.variance_inflation_factor(input_relation: str | vDataFrame, X: str | list[str], X_idx: int | None = None) float | TableSample #
Computes the variance inflation factor (VIF). It can be used to detect multicollinearity in an OLS Regression Analysis.
Parameters#
- input_relation: SQLRelation
Input relation.
- X: list
Input Variables.
- X_idx: int
Index of the exogenous variable in X. If empty, a TableSample is returned with all the variables VIF.
Returns#
- float / TableSample
VIF.
Examples#
Initialization#
Let’s try this test on a dummy dataset that has the following elements:
data with multiple columns
Before we begin we can import the necessary libraries:
import verticapy as vp import numpy as np
Next, we can create some exogenous columns with varying collinearity:
N = 50 x_val_1 = list(range(N)) x_val_2 = [2 * x + np.random.normal(scale = 4) for x in x_val_1] x_val_3 = np.random.normal(0, 4, N)
We can use those values to create the
vDataFrame
:vdf = vp.vDataFrame( { "x1": x_val_1, "x2": x_val_2, "x3": x_val_3, } )
Data Visualization#
We can plot the data to see any underlying collinearity:
Let us first draw
x1
withx2
:vdf.scatter(["x1", "x2"])
We can see that
x1
andx2
are very correlated.Next let us observe
x1
andx3
:vdf.scatter(["x1", "x3"])
We can see that the two are not correlated.
Now we can confirm our observations by carrying out the VIC test. First, we can import the test:
from verticapy.machine_learning.model_selection.statistical_tests import variance_inflation_factor
And then apply it on the exogenous columns:
variance_inflation_factor(vdf, X = ["x1", "x2", "x3"])
X_idx VIF 1 "x1" 55.551544467656726 2 "x2" 56.20724585738083 3 "x3" 1.091602071668006 Rows: 1-3 | Columns: 2Note
We can clearly see that
x1
andx2
are correlated because of the high value of VIC. But there is no correlation withx3
as the VIC value is close to 1.