
verticapy.machine_learning.model_selection.statistical_tests.ols.variance_inflation_factor¶
- verticapy.machine_learning.model_selection.statistical_tests.ols.variance_inflation_factor(input_relation: Annotated[str | vDataFrame, ''], X: Annotated[str | list[str], 'STRING representing one column or a list of columns'], X_idx: int | None = None) float | TableSample ¶
Computes the variance inflation factor (VIF). It can be used to detect multicollinearity in an OLS Regression Analysis.
Parameters¶
- input_relation: SQLRelation
Input relation.
- X: list
Input Variables.
- X_idx: int
Index of the exogenous variable in X. If empty, a TableSample is returned with all the variables VIF.
Returns¶
- float / TableSample
VIF.
Examples¶
Initialization¶
Let’s try this test on a dummy dataset that has the following elements:
data with multiple columns
Before we begin we can import the necessary libraries:
import verticapy as vp import numpy as np
Next, we can create some exogenous columns with varying collinearity:
N = 50 x_val_1 = list(range(N)) x_val_2 = [2 * x + np.random.normal(scale = 4) for x in x_val_1] x_val_3 = np.random.normal(0, 4, N)
We can use those values to create the
vDataFrame
:vdf = vp.vDataFrame( { "x1": x_val_1, "x2": x_val_2, "x3": x_val_3, } )
Data Visualization¶
We can plot the data to see any underlying collinearity:
Let us first draw
x1
withx2
:vdf.scatter(["x1", "x2"])
We can see that
x1
andx2
are very correlated.Next let us observe
x1
andx3
:vdf.scatter(["x1", "x3"])