Loading...

verticapy.machine_learning.model_selection.statistical_tests.ols.variance_inflation_factor

verticapy.machine_learning.model_selection.statistical_tests.ols.variance_inflation_factor(input_relation: Annotated[str | vDataFrame, ''], X: Annotated[str | list[str], 'STRING representing one column or a list of columns'], X_idx: int | None = None) float | TableSample

Computes the variance inflation factor (VIF). It can be used to detect multicollinearity in an OLS Regression Analysis.

Parameters

input_relation: SQLRelation

Input relation.

X: list

Input Variables.

X_idx: int

Index of the exogenous variable in X. If empty, a TableSample is returned with all the variables VIF.

Returns

float / TableSample

VIF.

Examples

Initialization

Let’s try this test on a dummy dataset that has the following elements:

  • data with multiple columns

Before we begin we can import the necessary libraries:

import verticapy as vp

import numpy as np

Next, we can create some exogenous columns with varying collinearity:

N = 50
x_val_1 = list(range(N))
x_val_2 = [2 * x + np.random.normal(scale = 4) for x in x_val_1]
x_val_3 = np.random.normal(0, 4, N)

We can use those values to create the vDataFrame:

vdf = vp.vDataFrame(
    {
        "x1": x_val_1,
        "x2": x_val_2,
        "x3": x_val_3,
    }
)

Data Visualization

We can plot the data to see any underlying collinearity:

Let us first draw x1 with x2:

vdf.scatter(["x1", "x2"])

We can see that x1 and x2 are very correlated.

Next let us observe x1 and x3:

vdf.scatter(["x1", "x3"])