Decomposition#
Decomposition is the process of using an orthogonal transformation to convert a set of observations of possibly-correlated variables (with numerical values) into a set of values of linearly-uncorrelated variables called principal components.
Since some algorithms are sensitive to correlated predictors, it can be a good idea to use the PCA (Principal Component Analysis: Decomposition Technique) before applying the algorithm. Since some algorithms are also sensitive to the number of predictors, we’ll have to be picky with which variables we include.
To demonstrate data decomposition in VerticaPy, we’ll use the well-known ‘Iris’ dataset.
[1]:
from verticapy.datasets import load_iris
import verticapy as vp
vp.set_option("plotting_lib","highcharts")
vdf = load_iris()
display(vdf)
123 IdInteger | 123 PetalLengthCmNumeric(8) | 123 PetalWidthCmNumeric(8) | 123 SepalLengthCmNumeric(8) | 123 SepalWidthCmNumeric(8) | Abc SpeciesVarchar(30) | |
1 | 1 | 1.4 | 0.2 | 5.1 | 3.5 | Iris-setosa |
2 | 2 | 1.4 | 0.2 | 4.9 | 3.0 | Iris-setosa |
3 | 3 | 1.3 | 0.2 | 4.7 | 3.2 | Iris-setosa |
4 | 4 | 1.5 | 0.2 | 4.6 | 3.1 | Iris-setosa |
5 | 5 | 1.4 | 0.2 | 5.0 | 3.6 | Iris-setosa |
6 | 6 | 1.7 | 0.4 | 5.4 | 3.9 | Iris-setosa |
7 | 7 | 1.4 | 0.3 | 4.6 | 3.4 | Iris-setosa |
8 | 8 | 1.5 | 0.2 | 5.0 | 3.4 | Iris-setosa |
9 | 9 | 1.4 | 0.2 | 4.4 | 2.9 | Iris-setosa |
10 | 10 | 1.5 | 0.1 | 4.9 | 3.1 | Iris-setosa |
11 | 11 | 1.5 | 0.2 | 5.4 | 3.7 | Iris-setosa |
12 | 12 | 1.6 | 0.2 | 4.8 | 3.4 | Iris-setosa |
13 | 13 | 1.4 | 0.1 | 4.8 | 3.0 | Iris-setosa |
14 | 14 | 1.1 | 0.1 | 4.3 | 3.0 | Iris-setosa |
15 | 15 | 1.2 | 0.2 | 5.8 | 4.0 | Iris-setosa |
16 | 16 | 1.5 | 0.4 | 5.7 | 4.4 | Iris-setosa |
17 | 17 | 1.3 | 0.4 | 5.4 | 3.9 | Iris-setosa |
18 | 18 | 1.4 | 0.3 | 5.1 | 3.5 | Iris-setosa |
19 | 19 | 1.7 | 0.3 | 5.7 | 3.8 | Iris-setosa |
20 | 20 | 1.5 | 0.3 | 5.1 | 3.8 | Iris-setosa |
21 | 21 | 1.7 | 0.2 | 5.4 | 3.4 | Iris-setosa |
22 | 22 | 1.5 | 0.4 | 5.1 | 3.7 | Iris-setosa |
23 | 23 | 1.0 | 0.2 | 4.6 | 3.6 | Iris-setosa |
24 | 24 | 1.7 | 0.5 | 5.1 | 3.3 | Iris-setosa |
25 | 25 | 1.9 | 0.2 | 4.8 | 3.4 | Iris-setosa |
26 | 26 | 1.6 | 0.2 | 5.0 | 3.0 | Iris-setosa |
27 | 27 | 1.6 | 0.4 | 5.0 | 3.4 | Iris-setosa |
28 | 28 | 1.5 | 0.2 | 5.2 | 3.5 | Iris-setosa |
29 | 29 | 1.4 | 0.2 | 5.2 | 3.4 | Iris-setosa |
30 | 30 | 1.6 | 0.2 | 4.7 | 3.2 | Iris-setosa |
31 | 31 | 1.6 | 0.2 | 4.8 | 3.1 | Iris-setosa |
32 | 32 | 1.5 | 0.4 | 5.4 | 3.4 | Iris-setosa |
33 | 33 | 1.5 | 0.1 | 5.2 | 4.1 | Iris-setosa |
34 | 34 | 1.4 | 0.2 | 5.5 | 4.2 | Iris-setosa |
35 | 35 | 1.5 | 0.1 | 4.9 | 3.1 | Iris-setosa |
36 | 36 | 1.2 | 0.2 | 5.0 | 3.2 | Iris-setosa |
37 | 37 | 1.3 | 0.2 | 5.5 | 3.5 | Iris-setosa |
38 | 38 | 1.5 | 0.1 | 4.9 | 3.1 | Iris-setosa |
39 | 39 | 1.3 | 0.2 | 4.4 | 3.0 | Iris-setosa |
40 | 40 | 1.5 | 0.2 | 5.1 | 3.4 | Iris-setosa |
41 | 41 | 1.3 | 0.3 | 5.0 | 3.5 | Iris-setosa |
42 | 42 | 1.3 | 0.3 | 4.5 | 2.3 | Iris-setosa |
43 | 43 | 1.3 | 0.2 | 4.4 | 3.2 | Iris-setosa |
44 | 44 | 1.6 | 0.6 | 5.0 | 3.5 | Iris-setosa |
45 | 45 | 1.9 | 0.4 | 5.1 | 3.8 | Iris-setosa |
46 | 46 | 1.4 | 0.3 | 4.8 | 3.0 | Iris-setosa |
47 | 47 | 1.6 | 0.2 | 5.1 | 3.8 | Iris-setosa |
48 | 48 | 1.4 | 0.2 | 4.6 | 3.2 | Iris-setosa |
49 | 49 | 1.5 | 0.2 | 5.3 | 3.7 | Iris-setosa |
50 | 50 | 1.4 | 0.2 | 5.0 | 3.3 | Iris-setosa |
51 | 51 | 4.7 | 1.4 | 7.0 | 3.2 | Iris-versicolor |
52 | 52 | 4.5 | 1.5 | 6.4 | 3.2 | Iris-versicolor |
53 | 53 | 4.9 | 1.5 | 6.9 | 3.1 | Iris-versicolor |
54 | 54 | 4.0 | 1.3 | 5.5 | 2.3 | Iris-versicolor |
55 | 55 | 4.6 | 1.5 | 6.5 | 2.8 | Iris-versicolor |
56 | 56 | 4.5 | 1.3 | 5.7 | 2.8 | Iris-versicolor |
57 | 57 | 4.7 | 1.6 | 6.3 | 3.3 | Iris-versicolor |
58 | 58 | 3.3 | 1.0 | 4.9 | 2.4 | Iris-versicolor |
59 | 59 | 4.6 | 1.3 | 6.6 | 2.9 | Iris-versicolor |
60 | 60 | 3.9 | 1.4 | 5.2 | 2.7 | Iris-versicolor |
61 | 61 | 3.5 | 1.0 | 5.0 | 2.0 | Iris-versicolor |
62 | 62 | 4.2 | 1.5 | 5.9 | 3.0 | Iris-versicolor |
63 | 63 | 4.0 | 1.0 | 6.0 | 2.2 | Iris-versicolor |
64 | 64 | 4.7 | 1.4 | 6.1 | 2.9 | Iris-versicolor |
65 | 65 | 3.6 | 1.3 | 5.6 | 2.9 | Iris-versicolor |
66 | 66 | 4.4 | 1.4 | 6.7 | 3.1 | Iris-versicolor |
67 | 67 | 4.5 | 1.5 | 5.6 | 3.0 | Iris-versicolor |
68 | 68 | 4.1 | 1.0 | 5.8 | 2.7 | Iris-versicolor |
69 | 69 | 4.5 | 1.5 | 6.2 | 2.2 | Iris-versicolor |
70 | 70 | 3.9 | 1.1 | 5.6 | 2.5 | Iris-versicolor |
71 | 71 | 4.8 | 1.8 | 5.9 | 3.2 | Iris-versicolor |
72 | 72 | 4.0 | 1.3 | 6.1 | 2.8 | Iris-versicolor |
73 | 73 | 4.9 | 1.5 | 6.3 | 2.5 | Iris-versicolor |
74 | 74 | 4.7 | 1.2 | 6.1 | 2.8 | Iris-versicolor |
75 | 75 | 4.3 | 1.3 | 6.4 | 2.9 | Iris-versicolor |
76 | 76 | 4.4 | 1.4 | 6.6 | 3.0 | Iris-versicolor |
77 | 77 | 4.8 | 1.4 | 6.8 | 2.8 | Iris-versicolor |
78 | 78 | 5.0 | 1.7 | 6.7 | 3.0 | Iris-versicolor |
79 | 79 | 4.5 | 1.5 | 6.0 | 2.9 | Iris-versicolor |
80 | 80 | 3.5 | 1.0 | 5.7 | 2.6 | Iris-versicolor |
81 | 81 | 3.8 | 1.1 | 5.5 | 2.4 | Iris-versicolor |
82 | 82 | 3.7 | 1.0 | 5.5 | 2.4 | Iris-versicolor |
83 | 83 | 3.9 | 1.2 | 5.8 | 2.7 | Iris-versicolor |
84 | 84 | 5.1 | 1.6 | 6.0 | 2.7 | Iris-versicolor |
85 | 85 | 4.5 | 1.5 | 5.4 | 3.0 | Iris-versicolor |
86 | 86 | 4.5 | 1.6 | 6.0 | 3.4 | Iris-versicolor |
87 | 87 | 4.7 | 1.5 | 6.7 | 3.1 | Iris-versicolor |
88 | 88 | 4.4 | 1.3 | 6.3 | 2.3 | Iris-versicolor |
89 | 89 | 4.1 | 1.3 | 5.6 | 3.0 | Iris-versicolor |
90 | 90 | 4.0 | 1.3 | 5.5 | 2.5 | Iris-versicolor |
91 | 91 | 4.4 | 1.2 | 5.5 | 2.6 | Iris-versicolor |
92 | 92 | 4.6 | 1.4 | 6.1 | 3.0 | Iris-versicolor |
93 | 93 | 4.0 | 1.2 | 5.8 | 2.6 | Iris-versicolor |
94 | 94 | 3.3 | 1.0 | 5.0 | 2.3 | Iris-versicolor |
95 | 95 | 4.2 | 1.3 | 5.6 | 2.7 | Iris-versicolor |
96 | 96 | 4.2 | 1.2 | 5.7 | 3.0 | Iris-versicolor |
97 | 97 | 4.2 | 1.3 | 5.7 | 2.9 | Iris-versicolor |
98 | 98 | 4.3 | 1.3 | 6.2 | 2.9 | Iris-versicolor |
99 | 99 | 3.0 | 1.1 | 5.1 | 2.5 | Iris-versicolor |
100 | 100 | 4.1 | 1.3 | 5.7 | 2.8 | Iris-versicolor |
Notice that all the predictors are well-correlated with each other.
[2]:
vdf.corr()
[2]:
Let’s compute the PCA of the different elements.
[3]:
from verticapy.learn.decomposition import PCA
vp.drop("pca_iris")
model = PCA("pca_iris")
model.fit("iris", ["PetalLengthCm",
"SepalWidthCm",
"SepalLengthCm",
"PetalWidthCm"])
[3]:
'\n\n=======\ncolumns\n=======\nindex| name | mean | sd \n-----+-------------+--------+--------\n 1 |petallengthcm| 3.75867| 1.76442\n 2 |sepalwidthcm | 3.05400| 0.43359\n 3 |sepallengthcm| 5.84333| 0.82807\n 4 |petalwidthcm | 1.19867| 0.76316\n\n\n===============\nsingular_values\n===============\nindex| value |explained_variance|accumulated_explained_variance\n-----+--------+------------------+------------------------------\n 1 | 2.05544| 0.92462 | 0.92462 \n 2 | 0.49218| 0.05302 | 0.97763 \n 3 | 0.28022| 0.01719 | 0.99482 \n 4 | 0.15389| 0.00518 | 1.00000 \n\n\n====================\nprincipal_components\n====================\nindex| PC1 | PC2 | PC3 | PC4 \n-----+--------+--------+--------+--------\n 1 | 0.85657|-0.17577| 0.07252|-0.47972\n 2 |-0.08227| 0.72971| 0.59642|-0.32409\n 3 | 0.36159| 0.65654|-0.58100| 0.31725\n 4 | 0.35884|-0.07471| 0.54906| 0.75112\n\n\n========\ncounters\n========\n counter_name |counter_value\n------------------+-------------\naccepted_row_count| 150 \nrejected_row_count| 0 \n iteration_count | 1 \n\n\n===========\ncall_string\n===========\nSELECT PCA(\'public.pca_iris\', \'iris\', \'"PetalLengthCm", "SepalWidthCm", "SepalLengthCm", "PetalWidthCm"\'\nUSING PARAMETERS scale=false);'
Let’s compute the correlation matrix of the result of the PCA.
[4]:
model.transform().corr()
[4]:
Notice that the predictors are now independant and combined together and they have the exact same amount of information than the previous variables. Let’s look at the accumulated explained variance of the PCA components.
[5]:
model.explained_variance_
[5]:
array([0.92461621, 0.05301557, 0.01718514, 0.00518309])
Most of the information is in the first two components with more than 97.7% of explained variance. We can export this result to a vDataFrame.
[6]:
model.transform(n_components = 2)
[6]:
123 IdInteger | Abc SpeciesVarchar(30) | 123 col1Float(22) | 123 col2Float(22) | |
1 | 1 | Iris-setosa | -1.22853483184053 | -2.32797486205867 |
2 | 2 | Iris-setosa | -1.48027473044093 | -2.42192960364832 |
3 | 3 | Iris-setosa | -1.56648109119663 | -2.55060213398962 |
4 | 4 | Iris-setosa | -1.46721003050149 | -2.64393895599043 |
5 | 5 | Iris-setosa | -1.22880940695385 | -2.40109949740075 |
6 | 6 | Iris-setosa | -0.736002504518092 | -2.06768323189023 |
7 | 7 | Iris-setosa | -1.4534409521453 | -2.57580291955543 |
8 | 8 | Iris-setosa | -1.21492098167444 | -2.40373494371661 |
9 | 9 | Iris-setosa | -1.69695396175647 | -2.74272889827773 |
10 | 10 | Iris-setosa | -1.35050623829783 | -2.51994822813733 |
11 | 11 | Iris-setosa | -0.962631932847397 | -2.16353093144279 |
12 | 12 | Iris-setosa | -1.20158170662168 | -2.55261966071664 |
13 | 13 | Iris-setosa | -1.50820680918985 | -2.56055482910955 |
14 | 14 | Iris-setosa | -1.94597327946773 | -2.83609454972387 |
15 | 15 | Iris-setosa | -0.967314515607511 | -1.87059669814037 |
16 | 16 | Iris-setosa | -0.619418059237654 | -1.87292102128627 |
17 | 17 | Iris-setosa | -1.0786313466343 | -1.99737627051877 |
18 | 18 | Iris-setosa | -1.23676172082975 | -2.25500362492602 |
19 | 19 | Iris-setosa | -0.655183104939257 | -1.93622185702363 |
20 | 20 | Iris-setosa | -1.04345133242623 | -2.29499230630939 |
21 | 21 | Iris-setosa | -0.898970689663757 | -2.17627247108801 |
22 | 22 | Iris-setosa | -1.08756261404028 | -2.21455042216324 |
23 | 23 | Iris-setosa | -1.71607412002264 | -2.59340848934362 |
24 | 24 | Iris-setosa | -1.06801265247068 | -2.14685007766231 |
25 | 25 | Iris-setosa | -0.944610075034521 | -2.60534988174524 |
26 | 26 | Iris-setosa | -1.27280134164468 | -2.39142909600546 |
27 | 27 | Iris-setosa | -1.14571754912383 | -2.27536920979418 |
28 | 28 | Iris-setosa | -1.10671865357333 | -2.27989761407295 |
29 | 29 | Iris-setosa | -1.2282602567272 | -2.25485022671658 |
30 | 30 | Iris-setosa | -1.30950945960947 | -2.60333235501822 |
31 | 31 | Iris-setosa | -1.30923488449614 | -2.53020771967613 |
32 | 32 | Iris-setosa | -1.0867388887003 | -1.99517651613698 |
33 | 33 | Iris-setosa | -0.88318540883518 | -2.39769273328662 |
34 | 34 | Iris-setosa | -0.832708212514197 | -2.11765343783886 |
35 | 35 | Iris-setosa | -1.35050623829783 | -2.51994822813733 |
36 | 36 | Iris-setosa | -1.54366139851124 | -2.33606342866101 |
37 | 37 | Iris-setosa | -1.169556171417 | -2.04778216840147 |
38 | 38 | Iris-setosa | -1.35050623829783 | -2.51994822813733 |
39 | 39 | Iris-setosa | -1.7467267796607 | -2.73262280494837 |
40 | 40 | Iris-setosa | -1.1787620139363 | -2.33808095538803 |
41 | 41 | Iris-setosa | -1.35857789909695 | -2.30308087291174 |
42 | 42 | Iris-setosa | -1.96998544928553 | -2.54170305039261 |
43 | 43 | Iris-setosa | -1.67495799441106 | -2.74756409897537 |
44 | 44 | Iris-setosa | -1.12628693447745 | -2.13689738254238 |
45 | 45 | Iris-setosa | -0.709049379299243 | -2.29232803054821 |
46 | 46 | Iris-setosa | -1.52466058716829 | -2.41461235484425 |
47 | 47 | Iris-setosa | -0.949567232907958 | -2.38554028378491 |
48 | 48 | Iris-setosa | -1.51698284840572 | -2.63383286266107 |
49 | 49 | Iris-setosa | -0.998790900585542 | -2.22918491977137 |
50 | 50 | Iris-setosa | -1.33646258482832 | -2.37868755636024 |
51 | 51 | Iris-versicolor | 2.07879765689785 | -0.762514728497838 |
52 | 52 | Iris-versicolor | 1.68230254042165 | -1.04831394065096 |
53 | 53 | Iris-versicolor | 2.16984182860376 | -0.782880313365999 |
54 | 54 | Iris-versicolor | 0.622080022488128 | -1.63002278503764 |
55 | 55 | Iris-versicolor | 1.66058114818956 | -0.970354104611224 |
56 | 56 | Iris-versicolor | 1.30210597373379 | -1.62395174516232 |
57 | 57 | Iris-versicolor | 1.84511549737721 | -1.08362081954612 |
58 | 58 | Iris-versicolor | -0.133909198051625 | -2.12729389102054 |
59 | 59 | Iris-versicolor | 1.74907828653097 | -1.05811323756144 |
60 | 60 | Iris-versicolor | 0.563256590254706 | -1.76631936060189 |
61 | 61 | Iris-versicolor | -0.0699733797546608 | -2.06691079532367 |
62 | 62 | Iris-versicolor | 1.17276728489412 | -1.30891236723827 |
63 | 63 | Iris-versicolor | 0.791671135521697 | -1.51319590777918 |
64 | 64 | Iris-versicolor | 1.64571376938008 | -1.33098868241458 |
65 | 65 | Iris-versicolor | 0.530916503858992 | -1.53888571741862 |
66 | 66 | Iris-versicolor | 1.67746472947143 | -0.899275825441488 |
67 | 67 | Iris-versicolor | 1.32126201326685 | -1.55860455325261 |
68 | 68 | Iris-versicolor | 0.984432373698567 | -1.69943385984672 |
69 | 69 | Iris-versicolor | 1.25114067869714 | -1.10491544717309 |
70 | 70 | Iris-versicolor | 0.660804342925307 | -1.7076758246585 |
71 | 71 | Iris-versicolor | 1.73379866635042 | -1.21040039192452 |
72 | 72 | Iris-versicolor | 1.01845579204111 | -1.27345209013366 |
73 | 73 | Iris-versicolor | 1.73758166642596 | -1.13198036125648 |
74 | 74 | Iris-versicolor | 1.6262831547337 | -1.46946050966637 |
75 | 75 | Iris-versicolor | 1.41978871946752 | -1.13669099319001 |
76 | 76 | Iris-versicolor | 1.60542136910846 | -0.957459166756568 |
77 | 77 | Iris-versicolor | 1.94859936145132 | -0.881516857443856 |
78 | 78 | Iris-versicolor | 2.13084293305326 | -0.778351909087227 |
79 | 79 | Iris-versicolor | 1.4300134915946 | -1.28851795292478 |
80 | 80 | Iris-versicolor | 0.398445750161284 | -1.65215675910461 |
81 | 81 | Iris-versicolor | 0.503103772033287 | -1.74828242563072 |
82 | 82 | Iris-versicolor | 0.425673450493456 | -1.8036769224205 |
83 | 83 | Iris-versicolor | 0.796664174662019 | -1.51833790489569 |
84 | 84 | Iris-versicolor | 1.86396108053006 | -1.30606586382231 |
85 | 85 | Iris-versicolor | 1.24894407779056 | -1.68991252990978 |
86 | 86 | Iris-versicolor | 1.60120856572949 | -1.25289995085965 |
87 | 87 | Iris-versicolor | 1.92620947206937 | -0.879034809337434 |
88 | 88 | Iris-versicolor | 1.2539806065095 | -1.17509783978044 |
89 | 89 | Iris-versicolor | 0.995086949129077 | -1.63424006614645 |
90 | 90 | Iris-versicolor | 0.693848807737772 | -1.64496407906465 |
91 | 91 | Iris-versicolor | 1.08058893146803 | -1.79571292458227 |
92 | 92 | Iris-versicolor | 1.59594095147584 | -1.32088258908521 |
93 | 93 | Iris-versicolor | 0.84643699256625 | -1.52844399822506 |
94 | 94 | Iris-versicolor | -0.133634622938302 | -2.05416925567845 |
95 | 95 | Iris-versicolor | 0.973090981783665 | -1.62940486544881 |
96 | 96 | Iris-versicolor | 1.1251300163855 | -1.65913405529338 |
97 | 97 | Iris-versicolor | 1.08101873477145 | -1.57869217114723 |
98 | 98 | Iris-versicolor | 1.34747078399123 | -1.26799896984718 |
99 | 99 | Iris-versicolor | -0.290905390526893 | -1.87775510321563 |
100 | 100 | Iris-versicolor | 0.959477131617579 | -1.55364478379086 |