
VerticaPy
Decomposition¶
Decomposition is the process of using an orthogonal transformation to convert a set of observations of possibly-correlated variables (with numerical values) into a set of values of linearly-uncorrelated variables called principal components.
Since some algorithms are sensitive to correlated predictors, it can be a good idea to use the PCA (Principal Component Analysis: Decomposition Technique) before applying the algorithm. Since some algorithms are also sensitive to the number of predictors, we'll have to be picky with which variables we include.
To demonstrate data decomposition in VerticaPy, we'll use the well-known 'Iris' dataset.
from verticapy.datasets import load_iris
vdf = load_iris()
display(vdf)
Notice that all the predictors are well-correlated with each other.
%matplotlib inline
vdf.corr()
Let's compute the PCA of the different elements.
from verticapy.learn.decomposition import PCA
model = PCA("pca_iris")
model.fit("iris", ["PetalLengthCm",
"SepalWidthCm",
"SepalLengthCm",
"PetalWidthCm"])
Let's compute the correlation matrix of the result of the PCA.
model.transform().corr()
Notice that the predictors are now independant and combined together and they have the exact same amount of information than the previous variables. Let's look at the accumulated explained variance of the PCA components.
model.explained_variance_
Most of the information is in the first two components with more than 97.7% of explained variance. We can export this result to a vDataFrame.
model.transform(n_components = 2)