### VerticaPy

Python API for Vertica Data Science at Scale

# Decomposition¶

Decomposition is the process of using an orthogonal transformation to convert a set of observations of possibly-correlated variables (with numerical values) into a set of values of linearly-uncorrelated variables called principal components.

Since some algorithms are sensitive to correlated predictors, it can be a good idea to use the PCA (Principal Component Analysis: Decomposition Technique) before applying the algorithm. Since some algorithms are also sensitive to the number of predictors, we'll have to be picky with which variables we include.

To demonstrate data decomposition in VerticaPy, we'll use the well-known 'Iris' dataset.

In [13]:
```from verticapy.datasets import load_iris
display(vdf)
```
 123IdInt 123PetalLengthCmNumeric(6,3) 123PetalWidthCmNumeric(6,3) 123SepalLengthCmNumeric(6,3) 123SepalWidthCmNumeric(6,3) AbcSpeciesVarchar(30) 1 1 1.4 0.2 5.1 3.5 Iris-setosa 2 2 1.4 0.2 4.9 3.0 Iris-setosa 3 3 1.3 0.2 4.7 3.2 Iris-setosa 4 4 1.5 0.2 4.6 3.1 Iris-setosa 5 5 1.4 0.2 5.0 3.6 Iris-setosa 6 6 1.7 0.4 5.4 3.9 Iris-setosa 7 7 1.4 0.3 4.6 3.4 Iris-setosa 8 8 1.5 0.2 5.0 3.4 Iris-setosa 9 9 1.4 0.2 4.4 2.9 Iris-setosa 10 10 1.5 0.1 4.9 3.1 Iris-setosa 11 11 1.5 0.2 5.4 3.7 Iris-setosa 12 12 1.6 0.2 4.8 3.4 Iris-setosa 13 13 1.4 0.1 4.8 3.0 Iris-setosa 14 14 1.1 0.1 4.3 3.0 Iris-setosa 15 15 1.2 0.2 5.8 4.0 Iris-setosa 16 16 1.5 0.4 5.7 4.4 Iris-setosa 17 17 1.3 0.4 5.4 3.9 Iris-setosa 18 18 1.4 0.3 5.1 3.5 Iris-setosa 19 19 1.7 0.3 5.7 3.8 Iris-setosa 20 20 1.5 0.3 5.1 3.8 Iris-setosa 21 21 1.7 0.2 5.4 3.4 Iris-setosa 22 22 1.5 0.4 5.1 3.7 Iris-setosa 23 23 1.0 0.2 4.6 3.6 Iris-setosa 24 24 1.7 0.5 5.1 3.3 Iris-setosa 25 25 1.9 0.2 4.8 3.4 Iris-setosa 26 26 1.6 0.2 5.0 3.0 Iris-setosa 27 27 1.6 0.4 5.0 3.4 Iris-setosa 28 28 1.5 0.2 5.2 3.5 Iris-setosa 29 29 1.4 0.2 5.2 3.4 Iris-setosa 30 30 1.6 0.2 4.7 3.2 Iris-setosa 31 31 1.6 0.2 4.8 3.1 Iris-setosa 32 32 1.5 0.4 5.4 3.4 Iris-setosa 33 33 1.5 0.1 5.2 4.1 Iris-setosa 34 34 1.4 0.2 5.5 4.2 Iris-setosa 35 35 1.5 0.1 4.9 3.1 Iris-setosa 36 36 1.2 0.2 5.0 3.2 Iris-setosa 37 37 1.3 0.2 5.5 3.5 Iris-setosa 38 38 1.5 0.1 4.9 3.1 Iris-setosa 39 39 1.3 0.2 4.4 3.0 Iris-setosa 40 40 1.5 0.2 5.1 3.4 Iris-setosa 41 41 1.3 0.3 5.0 3.5 Iris-setosa 42 42 1.3 0.3 4.5 2.3 Iris-setosa 43 43 1.3 0.2 4.4 3.2 Iris-setosa 44 44 1.6 0.6 5.0 3.5 Iris-setosa 45 45 1.9 0.4 5.1 3.8 Iris-setosa 46 46 1.4 0.3 4.8 3.0 Iris-setosa 47 47 1.6 0.2 5.1 3.8 Iris-setosa 48 48 1.4 0.2 4.6 3.2 Iris-setosa 49 49 1.5 0.2 5.3 3.7 Iris-setosa 50 50 1.4 0.2 5.0 3.3 Iris-setosa 51 51 4.7 1.4 7.0 3.2 Iris-versicolor 52 52 4.5 1.5 6.4 3.2 Iris-versicolor 53 53 4.9 1.5 6.9 3.1 Iris-versicolor 54 54 4.0 1.3 5.5 2.3 Iris-versicolor 55 55 4.6 1.5 6.5 2.8 Iris-versicolor 56 56 4.5 1.3 5.7 2.8 Iris-versicolor 57 57 4.7 1.6 6.3 3.3 Iris-versicolor 58 58 3.3 1.0 4.9 2.4 Iris-versicolor 59 59 4.6 1.3 6.6 2.9 Iris-versicolor 60 60 3.9 1.4 5.2 2.7 Iris-versicolor 61 61 3.5 1.0 5.0 2.0 Iris-versicolor 62 62 4.2 1.5 5.9 3.0 Iris-versicolor 63 63 4.0 1.0 6.0 2.2 Iris-versicolor 64 64 4.7 1.4 6.1 2.9 Iris-versicolor 65 65 3.6 1.3 5.6 2.9 Iris-versicolor 66 66 4.4 1.4 6.7 3.1 Iris-versicolor 67 67 4.5 1.5 5.6 3.0 Iris-versicolor 68 68 4.1 1.0 5.8 2.7 Iris-versicolor 69 69 4.5 1.5 6.2 2.2 Iris-versicolor 70 70 3.9 1.1 5.6 2.5 Iris-versicolor 71 71 4.8 1.8 5.9 3.2 Iris-versicolor 72 72 4.0 1.3 6.1 2.8 Iris-versicolor 73 73 4.9 1.5 6.3 2.5 Iris-versicolor 74 74 4.7 1.2 6.1 2.8 Iris-versicolor 75 75 4.3 1.3 6.4 2.9 Iris-versicolor 76 76 4.4 1.4 6.6 3.0 Iris-versicolor 77 77 4.8 1.4 6.8 2.8 Iris-versicolor 78 78 5.0 1.7 6.7 3.0 Iris-versicolor 79 79 4.5 1.5 6.0 2.9 Iris-versicolor 80 80 3.5 1.0 5.7 2.6 Iris-versicolor 81 81 3.8 1.1 5.5 2.4 Iris-versicolor 82 82 3.7 1.0 5.5 2.4 Iris-versicolor 83 83 3.9 1.2 5.8 2.7 Iris-versicolor 84 84 5.1 1.6 6.0 2.7 Iris-versicolor 85 85 4.5 1.5 5.4 3.0 Iris-versicolor 86 86 4.5 1.6 6.0 3.4 Iris-versicolor 87 87 4.7 1.5 6.7 3.1 Iris-versicolor 88 88 4.4 1.3 6.3 2.3 Iris-versicolor 89 89 4.1 1.3 5.6 3.0 Iris-versicolor 90 90 4.0 1.3 5.5 2.5 Iris-versicolor 91 91 4.4 1.2 5.5 2.6 Iris-versicolor 92 92 4.6 1.4 6.1 3.0 Iris-versicolor 93 93 4.0 1.2 5.8 2.6 Iris-versicolor 94 94 3.3 1.0 5.0 2.3 Iris-versicolor 95 95 4.2 1.3 5.6 2.7 Iris-versicolor 96 96 4.2 1.2 5.7 3.0 Iris-versicolor 97 97 4.2 1.3 5.7 2.9 Iris-versicolor 98 98 4.3 1.3 6.2 2.9 Iris-versicolor 99 99 3.0 1.1 5.1 2.5 Iris-versicolor 100 100 4.1 1.3 5.7 2.8 Iris-versicolor
Rows: 1-100 | Columns: 6

Notice that all the predictors are well-correlated with each other.

In [14]:
```%matplotlib inline
vdf.corr()
```
Out[14]:
 "Id" "PetalLengthCm" "PetalWidthCm" "SepalLengthCm" "SepalWidthCm" "Id" 1.0 0.882747318139054 0.899758577093348 0.716676272853919 -0.397728811465943 "PetalLengthCm" 0.882747318139054 1.0 0.962757097050966 0.871754157304886 -0.420516096401188 "PetalWidthCm" 0.899758577093348 0.962757097050966 1.0 0.817953633369177 -0.356544089613822 "SepalLengthCm" 0.716676272853919 0.871754157304886 0.817953633369177 1.0 -0.109369249950673 "SepalWidthCm" -0.397728811465943 -0.420516096401188 -0.356544089613822 -0.109369249950673 1.0
Rows: 1-5 | Columns: 6

Let's compute the PCA of the different elements.

In [15]:
```from verticapy.learn.decomposition import PCA
model = PCA("pca_iris")
model.fit("iris", ["PetalLengthCm",
"SepalWidthCm",
"SepalLengthCm",
"PetalWidthCm"])
```
Out[15]:
```
=======
columns
=======
index|    name     |  mean  |   sd
-----+-------------+--------+--------
1  |petallengthcm| 3.75867| 1.76442
2  |sepalwidthcm | 3.05400| 0.43359
3  |sepallengthcm| 5.84333| 0.82807
4  |petalwidthcm | 1.19867| 0.76316

===============
singular_values
===============
index| value  |explained_variance|accumulated_explained_variance
-----+--------+------------------+------------------------------
1  | 2.05544|      0.92462     |            0.92462
2  | 0.49218|      0.05302     |            0.97763
3  | 0.28022|      0.01719     |            0.99482
4  | 0.15389|      0.00518     |            1.00000

====================
principal_components
====================
index|  PC1   |  PC2   |  PC3   |  PC4
-----+--------+--------+--------+--------
1  | 0.85657|-0.17577| 0.07252|-0.47972
2  |-0.08227| 0.72971| 0.59642|-0.32409
3  | 0.36159| 0.65654|-0.58100| 0.31725
4  | 0.35884|-0.07471| 0.54906| 0.75112

========
counters
========
counter_name   |counter_value
------------------+-------------
accepted_row_count|     150
rejected_row_count|      0
iteration_count  |      1

===========
call_string
===========
SELECT PCA('public.pca_iris', 'iris', '"PetalLengthCm", "SepalWidthCm", "SepalLengthCm", "PetalWidthCm"'
USING PARAMETERS scale=false);```

Let's compute the correlation matrix of the result of the PCA.

In [16]:
```model.transform().corr()
```
Out[16]:
 "Id" "col1" "col2" "col3" "col4" "Id" 1.0 0.858408005609454 0.827013735836634 0.292710401628376 -0.856073579988209 "col1" 0.858408005609454 1.0 0.967717512512529 0.145745918898548 -0.867219868688767 "col2" 0.827013735836634 0.967717512512529 1.0 0.120932871081153 -0.79027964248929 "col3" 0.292710401628376 0.145745918898548 0.120932871081153 1.0 -0.152556221222614 "col4" -0.856073579988209 -0.867219868688767 -0.79027964248929 -0.152556221222614 1.0
Rows: 1-5 | Columns: 6

Notice that the predictors are now independant and combined together and they have the exact same amount of information than the previous variables. Let's look at the accumulated explained variance of the PCA components.

In [17]:
```model.explained_variance_
```
Out[17]:
 123valueFloat 123explained_varianceFloat 123accumulated_explained_varianceFloat 1 2.05544174529956 0.924616207174268 0.924616207174268 2 0.492182457659265 0.0530155678505348 0.977631775024803 3 0.280221177097939 0.0171851395250068 0.99481691454981 4 0.15389290797825 0.00518308545018998 1.0
Rows: 1-4 | Columns: 4

Most of the information is in the first two components with more than 97.7% of explained variance. We can export this result to a vDataFrame.

In [20]:
```model.transform(n_components = 2)
```
Out[20]:
 123IdInteger AbcSpeciesVarchar(30) 123col1Float 123col2Float 1 1 Iris-setosa -1.22853483184053 -2.32797486205867 2 2 Iris-setosa -1.48027473044093 -2.42192960364832 3 3 Iris-setosa -1.56648109119663 -2.55060213398962 4 4 Iris-setosa -1.46721003050149 -2.64393895599043 5 5 Iris-setosa -1.22880940695385 -2.40109949740075 6 6 Iris-setosa -0.736002504518092 -2.06768323189023 7 7 Iris-setosa -1.4534409521453 -2.57580291955543 8 8 Iris-setosa -1.21492098167444 -2.40373494371661 9 9 Iris-setosa -1.69695396175647 -2.74272889827773 10 10 Iris-setosa -1.35050623829783 -2.51994822813733 11 11 Iris-setosa -0.962631932847397 -2.16353093144279 12 12 Iris-setosa -1.20158170662168 -2.55261966071664 13 13 Iris-setosa -1.50820680918985 -2.56055482910955 14 14 Iris-setosa -1.94597327946773 -2.83609454972387 15 15 Iris-setosa -0.967314515607511 -1.87059669814037 16 16 Iris-setosa -0.619418059237654 -1.87292102128627 17 17 Iris-setosa -1.0786313466343 -1.99737627051877 18 18 Iris-setosa -1.23676172082975 -2.25500362492602 19 19 Iris-setosa -0.655183104939257 -1.93622185702363 20 20 Iris-setosa -1.04345133242623 -2.29499230630939 21 21 Iris-setosa -0.898970689663757 -2.17627247108801 22 22 Iris-setosa -1.08756261404028 -2.21455042216324 23 23 Iris-setosa -1.71607412002264 -2.59340848934362 24 24 Iris-setosa -1.06801265247068 -2.14685007766231 25 25 Iris-setosa -0.944610075034521 -2.60534988174524 26 26 Iris-setosa -1.27280134164468 -2.39142909600546 27 27 Iris-setosa -1.14571754912383 -2.27536920979418 28 28 Iris-setosa -1.10671865357333 -2.27989761407295 29 29 Iris-setosa -1.2282602567272 -2.25485022671658 30 30 Iris-setosa -1.30950945960947 -2.60333235501822 31 31 Iris-setosa -1.30923488449614 -2.53020771967613 32 32 Iris-setosa -1.0867388887003 -1.99517651613698 33 33 Iris-setosa -0.88318540883518 -2.39769273328662 34 34 Iris-setosa -0.832708212514197 -2.11765343783886 35 35 Iris-setosa -1.35050623829783 -2.51994822813733 36 36 Iris-setosa -1.54366139851124 -2.33606342866101 37 37 Iris-setosa -1.169556171417 -2.04778216840147 38 38 Iris-setosa -1.35050623829783 -2.51994822813733 39 39 Iris-setosa -1.7467267796607 -2.73262280494837 40 40 Iris-setosa -1.1787620139363 -2.33808095538803 41 41 Iris-setosa -1.35857789909695 -2.30308087291174 42 42 Iris-setosa -1.96998544928553 -2.54170305039261 43 43 Iris-setosa -1.67495799441106 -2.74756409897537 44 44 Iris-setosa -1.12628693447745 -2.13689738254238 45 45 Iris-setosa -0.709049379299243 -2.29232803054821 46 46 Iris-setosa -1.52466058716829 -2.41461235484425 47 47 Iris-setosa -0.949567232907958 -2.38554028378491 48 48 Iris-setosa -1.51698284840572 -2.63383286266107 49 49 Iris-setosa -0.998790900585542 -2.22918491977137 50 50 Iris-setosa -1.33646258482832 -2.37868755636024 51 51 Iris-versicolor 2.07879765689785 -0.762514728497838 52 52 Iris-versicolor 1.68230254042165 -1.04831394065096 53 53 Iris-versicolor 2.16984182860376 -0.782880313365999 54 54 Iris-versicolor 0.622080022488128 -1.63002278503764 55 55 Iris-versicolor 1.66058114818956 -0.970354104611224 56 56 Iris-versicolor 1.30210597373379 -1.62395174516232 57 57 Iris-versicolor 1.84511549737721 -1.08362081954612 58 58 Iris-versicolor -0.133909198051625 -2.12729389102054 59 59 Iris-versicolor 1.74907828653097 -1.05811323756144 60 60 Iris-versicolor 0.563256590254706 -1.76631936060189 61 61 Iris-versicolor -0.0699733797546608 -2.06691079532367 62 62 Iris-versicolor 1.17276728489412 -1.30891236723827 63 63 Iris-versicolor 0.791671135521697 -1.51319590777918 64 64 Iris-versicolor 1.64571376938008 -1.33098868241458 65 65 Iris-versicolor 0.530916503858992 -1.53888571741862 66 66 Iris-versicolor 1.67746472947143 -0.899275825441488 67 67 Iris-versicolor 1.32126201326685 -1.55860455325261 68 68 Iris-versicolor 0.984432373698567 -1.69943385984672 69 69 Iris-versicolor 1.25114067869714 -1.10491544717309 70 70 Iris-versicolor 0.660804342925307 -1.7076758246585 71 71 Iris-versicolor 1.73379866635042 -1.21040039192452 72 72 Iris-versicolor 1.01845579204111 -1.27345209013366 73 73 Iris-versicolor 1.73758166642596 -1.13198036125648 74 74 Iris-versicolor 1.6262831547337 -1.46946050966637 75 75 Iris-versicolor 1.41978871946752 -1.13669099319001 76 76 Iris-versicolor 1.60542136910846 -0.957459166756568 77 77 Iris-versicolor 1.94859936145132 -0.881516857443856 78 78 Iris-versicolor 2.13084293305326 -0.778351909087227 79 79 Iris-versicolor 1.4300134915946 -1.28851795292478 80 80 Iris-versicolor 0.398445750161284 -1.65215675910461 81 81 Iris-versicolor 0.503103772033287 -1.74828242563072 82 82 Iris-versicolor 0.425673450493456 -1.8036769224205 83 83 Iris-versicolor 0.796664174662019 -1.51833790489569 84 84 Iris-versicolor 1.86396108053006 -1.30606586382231 85 85 Iris-versicolor 1.24894407779056 -1.68991252990978 86 86 Iris-versicolor 1.60120856572949 -1.25289995085965 87 87 Iris-versicolor 1.92620947206937 -0.879034809337434 88 88 Iris-versicolor 1.2539806065095 -1.17509783978044 89 89 Iris-versicolor 0.995086949129077 -1.63424006614645 90 90 Iris-versicolor 0.693848807737772 -1.64496407906465 91 91 Iris-versicolor 1.08058893146803 -1.79571292458227 92 92 Iris-versicolor 1.59594095147584 -1.32088258908521 93 93 Iris-versicolor 0.84643699256625 -1.52844399822506 94 94 Iris-versicolor -0.133634622938302 -2.05416925567845 95 95 Iris-versicolor 0.973090981783665 -1.62940486544881 96 96 Iris-versicolor 1.1251300163855 -1.65913405529338 97 97 Iris-versicolor 1.08101873477145 -1.57869217114723 98 98 Iris-versicolor 1.34747078399123 -1.26799896984718 99 99 Iris-versicolor -0.290905390526893 -1.87775510321563 100 100 Iris-versicolor 0.959477131617579 -1.55364478379086
Rows: 1-100 | Columns: 4