Loading...

verticapy.machine_learning.model_selection.statistical_tests.ols.variance_inflation_factor#

verticapy.machine_learning.model_selection.statistical_tests.ols.variance_inflation_factor(input_relation: str | vDataFrame, X: str | list[str], X_idx: int | None = None) float | TableSample#

Computes the variance inflation factor (VIF). It can be used to detect multicollinearity in an OLS Regression Analysis.

Parameters#

input_relation: SQLRelation

Input relation.

X: list

Input Variables.

X_idx: int

Index of the exogenous variable in X. If empty, a TableSample is returned with all the variables VIF.

Returns#

float / TableSample

VIF.

Examples#

Initialization#

Let’s try this test on a dummy dataset that has the following elements:

  • data with multiple columns

Before we begin we can import the necessary libraries:

import verticapy as vp

import numpy as np

Next, we can create some exogenous columns with varying collinearity:

N = 50
x_val_1 = list(range(N))
x_val_2 = [2 * x + np.random.normal(scale = 4) for x in x_val_1]
x_val_3 = np.random.normal(0, 4, N)

We can use those values to create the vDataFrame:

vdf = vp.vDataFrame(
    {
        "x1": x_val_1,
        "x2": x_val_2,
        "x3": x_val_3,
    }
)

Data Visualization#

We can plot the data to see any underlying collinearity:

Let us first draw x1 with x2:

vdf.scatter(["x1", "x2"])

We can see that x1 and x2 are very correlated.

Next let us observe x1 and x3:

vdf.scatter(["x1", "x3"])

We can see that the two are not correlated.

Now we can confirm our observations by carrying out the VIC test. First, we can import the test:

from verticapy.machine_learning.model_selection.statistical_tests import variance_inflation_factor

And then apply it on the exogenous columns:

variance_inflation_factor(vdf, X = ["x1", "x2", "x3"])
X_idx
VIF
1"x1"55.551544467656726
2"x2"56.20724585738083
3"x3"1.091602071668006
Rows: 1-3 | Columns: 2

Note

We can clearly see that x1 and x2 are correlated because of the high value of VIC. But there is no correlation with x3 as the VIC value is close to 1.