# Iris¶

This example uses the 'iris' dataset to predict the species of various flowers based on their physical features. You can download the Jupyter Notebook of the study here.

• PetalLengthCm: Petal Length in cm
• PetalWidthCm: Petal Width in cm
• SepalLengthCm: Sepal Length in cm
• SepalWidthCm: Sepal Width in cm
• Species: The Flower Species (Setosa, Virginica, Versicolor)

We will follow the data science cycle (Data Exploration - Data Preparation - Data Modeling - Model Evaluation - Model Deployment) to solve this problem.

## Initialization¶

This example uses the following version of VerticaPy:

In [24]:
```import verticapy as vp
vp.__version__
```
Out[24]:
`'0.9.0'`

Connect to Vertica. This example uses an existing connection called "VerticaDSN." For details on how to create a connection, use see the connection tutorial.

In [1]:
```vp.connect("VerticaDSN")
```

Let's create a Virtual DataFrame of the dataset.

In [11]:
```from verticapy.datasets import load_iris
import verticapy.stats as st
```
Out[11]:
 123SepalLengthCmNumeric(5,2) 123SepalWidthCmNumeric(5,2) 123PetalLengthCmNumeric(5,2) 123PetalWidthCmNumeric(5,2) AbcSpeciesVarchar(30) 1 4.3 3.0 1.1 0.1 Iris-setosa 2 4.4 2.9 1.4 0.2 Iris-setosa 3 4.4 3.0 1.3 0.2 Iris-setosa 4 4.4 3.2 1.3 0.2 Iris-setosa 5 4.5 2.3 1.3 0.3 Iris-setosa
Rows: 1-5 | Columns: 5

## Data Exploration and Preparation¶

Let's explore the data by displaying descriptive statistics of all the columns.

In [12]:
```iris.describe(method = "categorical", unique=True)
```
Out[12]:
 dtype count top top_percent unique "SepalLengthCm" numeric(5,2) 150 5.0 6.667 35.0 "SepalWidthCm" numeric(5,2) 150 3.0 17.333 23.0 "PetalLengthCm" numeric(5,2) 150 1.5 9.333 43.0 "PetalWidthCm" numeric(5,2) 150 0.2 18.667 22.0 "Species" varchar(30) 150 Iris-setosa 33.333 3.0
Rows: 1-5 | Columns: 6

We don't have much data here, but that's okay; since different flower species have different proportions and ratios between those proportions, we can start by making ratios between each feature.

We'll need to use the One-Hot Encoder on the 'Species' to get information about each species.

In [13]:
```iris["Species"].one_hot_encode(drop_first = False)
iris["ratio_pwl"] = iris["PetalWidthCm"] / iris["PetalLengthCm"]
iris["ratio_swl"] = iris["SepalWidthCm"] / iris["SepalLengthCm"]
```

We can draw the correlation matrix (Pearson correlation coefficient) of the new features to see if there are some linear links.

In [14]:
```%matplotlib inline
iris.corr()
```
Out[14]:
 "SepalLengthCm" "SepalWidthCm" "PetalLengthCm" "PetalWidthCm" "Species_Iris-setosa" "Species_Iris-versicolor" "Species_Iris-virginica" "ratio_pwl" "ratio_swl" "SepalLengthCm" 1.0 -0.109369249950656 0.871754157304886 0.817953633369181 -0.717415668686111 0.0793955238434444 0.63802014484266 0.645854809352185 -0.724085081370438 "SepalWidthCm" -0.109369249950656 1.0 -0.420516096401169 -0.356544089613812 0.595600845226849 -0.464699560561606 -0.13090128466524 -0.339854865893446 0.755415996230207 "PetalLengthCm" 0.871754157304886 -0.420516096401169 1.0 0.962757097050968 -0.922688332883109 0.201586759537506 0.721101573345601 0.8129592586072 -0.867296724293694 "PetalWidthCm" 0.817953633369181 -0.356544089613812 0.962757097050968 1.0 -0.887509958782658 0.118375979139307 0.76913397964335 0.910838299356941 -0.796309230951869 "Species_Iris-setosa" -0.717415668686111 0.595600845226849 -0.922688332883109 -0.887509958782658 1.0 -0.5 -0.5 -0.825149778706552 0.907112624976622 "Species_Iris-versicolor" 0.0793955238434444 -0.464699560561606 0.201586759537506 0.118375979139307 -0.5 1.0 -0.5 0.212967593449348 -0.409714538856098 "Species_Iris-virginica" 0.63802014484266 -0.13090128466524 0.721101573345601 0.76913397964335 -0.5 -0.5 1.0 0.612182185257205 -0.497398086120528 "ratio_pwl" 0.645854809352185 -0.339854865893446 0.8129592586072 0.910838299356941 -0.825149778706552 0.212967593449348 0.612182185257205 1.0 -0.689720454015874 "ratio_swl" -0.724085081370438 0.755415996230207 -0.867296724293694 -0.796309230951869 0.907112624976622 -0.409714538856098 -0.497398086120528 -0.689720454015874 1.0
Rows: 1-9 | Columns: 10

The Iris setosa is highly linearly correlated with the petal length and the sepal ratio. We can see a perfect separation using the two features (though we can also see this separation the petal length alone).

In [15]:
```iris.scatter(columns = ["PetalLengthCm", "ratio_swl"],
catcol = "Species")
```
Out[15]:
`<AxesSubplot:xlabel='"PetalLengthCm"', ylabel='"ratio_swl"'>`

We can we a clear linear separation between the Iris setosa and the other species, but we'll need more features to identify the differences between Iris virginica and Iris versicolor.

In [16]:
```iris.scatter(columns = ["PetalLengthCm",
"PetalWidthCm",
"SepalLengthCm"],
catcol = "Species")
```
Out[16]:
`<Axes3DSubplot:xlabel='"PetalLengthCm"', ylabel='"PetalWidthCm"'>`

Our strategy is simple: we'll use two Linear Support Vector Classification (SVC): one to classify the Iris setosa and another to classify the Iris versicolor.

## Machine Learning¶

Let's build the first Linear SVC to predict if a flower is an Iris setosa.

In [17]:
```from verticapy.learn.svm import LinearSVC
from verticapy.learn.model_selection import cross_validate

predictors = ["PetalLengthCm", "ratio_swl"]
response = "Species_Iris-setosa"
model = LinearSVC("svc_setosa_iris")
cross_validate(model, iris, predictors, response)
```

Out[17]:
 auc prc_auc accuracy log_loss precision recall f1_score mcc informedness markedness csi time 1-fold 1.0 0.9999999999999999 1.0 0.0852135991098265 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.638392686843872 2-fold 1.0 0.9999999999999999 1.0 0.0808943478921133 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.5792129039764404 3-fold 0.9999999999999999 1.0 1.0 0.0744378576492842 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.4164810180664062 avg 1.0 0.9999999999999999 1.0 0.08018193488374134 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.544695536295573 std 6.409875621278546e-17 6.409875621278546e-17 0.0 0.005423080326436787 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.11491206937746823
Rows: 1-5 | Columns: 13

Our model is excellent. Let's build it using the entire dataset.

In [18]:
```model.fit(iris, predictors, response)
```
Out[18]:
```
=======
details
=======
predictor  |coefficient
-------------+-----------
Intercept  |  1.38349
petallengthcm| -0.84012
ratio_swl  |  1.32517

===========
call_string
===========
SELECT svm_classifier('public.svc_setosa_iris', '"public"."_verticapy_tmp_view_dbadmin_40328_9186882012_"', '"species_iris-setosa"', '"PetalLengthCm", "ratio_swl"'
USING PARAMETERS class_weights='1,1', C=1, max_iterations=100, intercept_mode='regularized', intercept_scaling=1, epsilon=0.0001);

===============
===============
Name       |Value
------------------+-----
accepted_row_count| 150
rejected_row_count|  0
iteration_count  |  7  ```

Let's plot the model to see the perfect separation.

In [19]:
```model.plot()
```
Out[19]:
`<AxesSubplot:xlabel='"PetalLengthCm"', ylabel='"ratio_swl"'>`

We can add this probability to the vDataFrame.

In [20]:
```model.predict_proba(iris, name = "setosa", pos_label=1)
```
Out[20]:
 123SepalLengthCmNumeric(5,2) 123SepalWidthCmNumeric(5,2) 123PetalLengthCmNumeric(5,2) 123PetalWidthCmNumeric(5,2) AbcSpeciesVarchar(30) 123Species_Iris-setosaBool 123Species_Iris-versicolorBool 123Species_Iris-virginicaBool 123ratio_pwlNumeric(20,15) 123ratio_swlNumeric(20,15) 123setosaFloat 1 4.3 3.0 1.1 0.1 Iris-setosa 1 0 0 0.090909090909091 0.697674418604651 0.799616133475824 2 4.4 2.9 1.4 0.2 Iris-setosa 1 0 0 0.142857142857143 0.659090909090909 0.746632427676863 3 4.4 3.0 1.3 0.2 Iris-setosa 1 0 0 0.153846153846154 0.681818181818182 0.767609329878423 4 4.4 3.2 1.3 0.2 Iris-setosa 1 0 0 0.153846153846154 0.727272727272727 0.778180760876583 5 4.5 2.3 1.3 0.3 Iris-setosa 1 0 0 0.230769230769231 0.511111111111111 0.724849376339106 6 4.6 3.1 1.5 0.2 Iris-setosa 1 0 0 0.133333333333333 0.673913043478261 0.734263300128653 7 4.6 3.2 1.4 0.2 Iris-setosa 1 0 0 0.142857142857143 0.695652173913043 0.755687832710383 8 4.6 3.4 1.4 0.3 Iris-setosa 1 0 0 0.214285714285714 0.739130434782609 0.766167839545688 9 4.6 3.6 1.0 0.2 Iris-setosa 1 0 0 0.2 0.782608695652174 0.82926980908022 10 4.7 3.2 1.3 0.2 Iris-setosa 1 0 0 0.153846153846154 0.680851063829787 0.76738063375719 11 4.7 3.2 1.6 0.2 Iris-setosa 1 0 0 0.125 0.680851063829787 0.719411418945644 12 4.8 3.0 1.4 0.1 Iris-setosa 1 0 0 0.071428571428571 0.625 0.737991521269184 13 4.8 3.0 1.4 0.3 Iris-setosa 1 0 0 0.214285714285714 0.625 0.737991521269184 14 4.8 3.1 1.6 0.2 Iris-setosa 1 0 0 0.125 0.645833333333333 0.709949644614095 15 4.8 3.4 1.6 0.2 Iris-setosa 1 0 0 0.125 0.708333333333333 0.726703765068597 16 4.8 3.4 1.9 0.2 Iris-setosa 1 0 0 0.105263157894737 0.708333333333333 0.673910864793945 17 4.9 2.4 3.3 1.0 Iris-versicolor 0 1 0 0.303030303030303 0.489795918367347 0.323039614860416 18 4.9 2.5 4.5 1.7 Iris-virginica 0 0 1 0.377777777777778 0.510204081632653 0.151750696077398 19 4.9 3.0 1.4 0.2 Iris-setosa 1 0 0 0.142857142857143 0.612244897959184 0.734710097493698 20 4.9 3.1 1.5 0.1 Iris-setosa 1 0 0 0.066666666666667 0.63265306122449 0.723459090938328 21 4.9 3.1 1.5 0.1 Iris-setosa 1 0 0 0.066666666666667 0.63265306122449 0.723459090938328 22 4.9 3.1 1.5 0.1 Iris-setosa 1 0 0 0.066666666666667 0.63265306122449 0.723459090938328 23 5.0 2.0 3.5 1.0 Iris-versicolor 0 1 0 0.285714285714286 0.4 0.263694030318376 24 5.0 2.3 3.3 1.0 Iris-versicolor 0 1 0 0.303030303030303 0.46 0.314465933883348 25 5.0 3.0 1.6 0.2 Iris-setosa 1 0 0 0.125 0.6 0.697285021121722 26 5.0 3.2 1.2 0.2 Iris-setosa 1 0 0 0.166666666666667 0.64 0.772671290753683 27 5.0 3.3 1.4 0.2 Iris-setosa 1 0 0 0.142857142857143 0.66 0.746860255947267 28 5.0 3.4 1.5 0.2 Iris-setosa 1 0 0 0.133333333333333 0.68 0.735834215605851 29 5.0 3.4 1.6 0.4 Iris-setosa 1 0 0 0.25 0.68 0.719183705655739 30 5.0 3.5 1.3 0.3 Iris-setosa 1 0 0 0.230769230769231 0.7 0.771879602615583 31 5.0 3.5 1.6 0.6 Iris-setosa 1 0 0 0.375 0.7 0.724505067474212 32 5.0 3.6 1.4 0.2 Iris-setosa 1 0 0 0.142857142857143 0.72 0.761595468468724 33 5.1 2.5 3.0 1.1 Iris-versicolor 0 1 0 0.366666666666667 0.490196078431373 0.380536424820992 34 5.1 3.3 1.7 0.5 Iris-setosa 1 0 0 0.294117647058824 0.647058823529412 0.692695552384892 35 5.1 3.4 1.5 0.2 Iris-setosa 1 0 0 0.133333333333333 0.666666666666667 0.732385408664398 36 5.1 3.5 1.4 0.2 Iris-setosa 1 0 0 0.142857142857143 0.686274509803922 0.753386230741964 37 5.1 3.5 1.4 0.3 Iris-setosa 1 0 0 0.214285714285714 0.686274509803922 0.753386230741964 38 5.1 3.7 1.5 0.4 Iris-setosa 1 0 0 0.266666666666667 0.725490196078431 0.747384308490649 39 5.1 3.8 1.5 0.3 Iris-setosa 1 0 0 0.2 0.745098039215686 0.752258455681401 40 5.1 3.8 1.6 0.2 Iris-setosa 1 0 0 0.125 0.745098039215686 0.736272141535994 41 5.1 3.8 1.9 0.4 Iris-setosa 1 0 0 0.210526315789474 0.745098039215686 0.68452518034775 42 5.2 2.7 3.9 1.4 Iris-versicolor 0 1 0 0.358974358974359 0.519230769230769 0.230604211159528 43 5.2 3.4 1.4 0.2 Iris-setosa 1 0 0 0.142857142857143 0.653846153846154 0.745315390833046 44 5.2 3.5 1.5 0.2 Iris-setosa 1 0 0 0.133333333333333 0.673076923076923 0.734047050003793 45 5.2 4.1 1.5 0.1 Iris-setosa 1 0 0 0.066666666666667 0.788461538461538 0.762811887474022 46 5.3 3.7 1.5 0.2 Iris-setosa 1 0 0 0.133333333333333 0.69811320754717 0.740473501806866 47 5.4 3.0 4.5 1.5 Iris-versicolor 0 1 0 0.333333333333333 0.555555555555556 0.159649649102876 48 5.4 3.4 1.5 0.4 Iris-setosa 1 0 0 0.266666666666667 0.62962962962963 0.722656797169744 49 5.4 3.4 1.7 0.2 Iris-setosa 1 0 0 0.117647058823529 0.62962962962963 0.687757248197022 50 5.4 3.7 1.5 0.2 Iris-setosa 1 0 0 0.133333333333333 0.685185185185185 0.737167698365962 51 5.4 3.9 1.3 0.4 Iris-setosa 1 0 0 0.307692307692308 0.722222222222222 0.777023333634539 52 5.4 3.9 1.7 0.4 Iris-setosa 1 0 0 0.235294117647059 0.722222222222222 0.713482010522215 53 5.5 2.3 4.0 1.3 Iris-versicolor 0 1 0 0.325 0.418181818181818 0.194219643823906 54 5.5 2.4 3.7 1.0 Iris-versicolor 0 1 0 0.27027027027027 0.436363636363636 0.241093824269382 55 5.5 2.4 3.8 1.1 Iris-versicolor 0 1 0 0.289473684210526 0.436363636363636 0.226058264210938 56 5.5 2.5 4.0 1.3 Iris-versicolor 0 1 0 0.325 0.454545454545455 0.201872265775898 57 5.5 2.6 4.4 1.2 Iris-versicolor 0 1 0 0.272727272727273 0.472727272727273 0.156225228875569 58 5.5 3.5 1.3 0.2 Iris-setosa 1 0 0 0.153846153846154 0.636363636363636 0.756691610416908 59 5.5 4.2 1.4 0.2 Iris-setosa 1 0 0 0.142857142857143 0.763636363636364 0.771935440609133 60 5.6 2.5 3.9 1.1 Iris-versicolor 0 1 0 0.282051282051282 0.446428571428571 0.213933207860434 61 5.6 2.7 4.2 1.3 Iris-versicolor 0 1 0 0.30952380952381 0.482142857142857 0.181519581637256 62 5.6 2.8 4.9 2.0 Iris-virginica 0 0 1 0.408163265306122 0.5 0.111996799091321 63 5.6 2.9 3.6 1.3 Iris-versicolor 0 1 0 0.361111111111111 0.517857142857143 0.277943121171659 64 5.6 3.0 4.1 1.3 Iris-versicolor 0 1 0 0.317073170731707 0.535714285714286 0.205693392743854 65 5.6 3.0 4.5 1.5 Iris-versicolor 0 1 0 0.333333333333333 0.535714285714286 0.156153617405426 66 5.7 2.5 5.0 2.0 Iris-virginica 0 0 1 0.4 0.43859649122807 0.0965737982202029 67 5.7 2.6 3.5 1.0 Iris-versicolor 0 1 0 0.285714285714286 0.456140350877193 0.278390203317582 68 5.7 2.8 4.1 1.3 Iris-versicolor 0 1 0 0.317073170731707 0.491228070175439 0.196228591121515 69 5.7 2.8 4.5 1.3 Iris-versicolor 0 1 0 0.288888888888889 0.491228070175439 0.148542053943545 70 5.7 2.9 4.2 1.3 Iris-versicolor 0 1 0 0.30952380952381 0.508771929824561 0.186821367793599 71 5.7 3.0 4.2 1.2 Iris-versicolor 0 1 0 0.285714285714286 0.526315789473684 0.190379016686771 72 5.7 3.8 1.7 0.3 Iris-setosa 1 0 0 0.176470588235294 0.666666666666667 0.698198801634805 73 5.7 4.4 1.5 0.4 Iris-setosa 1 0 0 0.266666666666667 0.771929824561404 0.758825404150609 74 5.8 2.6 4.0 1.2 Iris-versicolor 0 1 0 0.3 0.448275862068966 0.200536953059694 75 5.8 2.7 3.9 1.2 Iris-versicolor 0 1 0 0.307692307692308 0.46551724137931 0.218217860142616 76 5.8 2.7 4.1 1.0 Iris-versicolor 0 1 0 0.24390243902439 0.46551724137931 0.190910337541694 77 5.8 2.7 5.1 1.9 Iris-virginica 0 0 1 0.372549019607843 0.46551724137931 0.0924379577314657 78 5.8 2.7 5.1 1.9 Iris-virginica 0 0 1 0.372549019607843 0.46551724137931 0.0924379577314657 79 5.8 2.8 5.1 2.4 Iris-virginica 0 0 1 0.470588235294118 0.482758620689655 0.0943726599781477 80 5.8 4.0 1.2 0.2 Iris-setosa 1 0 0 0.166666666666667 0.689655172413793 0.784021608790257 81 5.9 3.0 4.2 1.5 Iris-versicolor 0 1 0 0.357142857142857 0.508474576271186 0.186761512366878 82 5.9 3.0 5.1 1.8 Iris-virginica 0 0 1 0.352941176470588 0.508474576271186 0.0973257207195747 83 5.9 3.2 4.8 1.8 Iris-versicolor 0 1 0 0.375 0.542372881355932 0.126712735034151 84 6.0 2.2 4.0 1.0 Iris-versicolor 0 1 0 0.25 0.366666666666667 0.183758509548148 85 6.0 2.2 5.0 1.5 Iris-virginica 0 0 1 0.3 0.366666666666667 0.0885712440311769 86 6.0 2.7 5.1 1.6 Iris-versicolor 0 1 0 0.313725490196078 0.45 0.0907272611961409 87 6.0 2.9 4.5 1.5 Iris-versicolor 0 1 0 0.333333333333333 0.483333333333333 0.147223724664182 88 6.0 3.0 4.8 1.8 Iris-virginica 0 0 1 0.375 0.5 0.120628357257542 89 6.0 3.4 4.5 1.6 Iris-versicolor 0 1 0 0.355555555555556 0.566666666666667 0.16163497279113 90 6.1 2.6 5.6 1.4 Iris-virginica 0 0 1 0.25 0.426229508196721 0.0597292318554775 91 6.1 2.8 4.0 1.3 Iris-versicolor 0 1 0 0.325 0.459016393442623 0.202828547374176 92 6.1 2.8 4.7 1.2 Iris-versicolor 0 1 0 0.25531914893617 0.459016393442623 0.123814688224948 93 6.1 2.9 4.7 1.4 Iris-versicolor 0 1 0 0.297872340425532 0.475409836065574 0.126190742362697 94 6.1 3.0 4.6 1.4 Iris-versicolor 0 1 0 0.304347826086957 0.491803278688525 0.138317919091645 95 6.1 3.0 4.9 1.8 Iris-virginica 0 0 1 0.36734693877551 0.491803278688525 0.110921074597917 96 6.2 2.2 4.5 1.5 Iris-versicolor 0 1 0 0.333333333333333 0.354838709677419 0.127102859928249 97 6.2 2.8 4.8 1.8 Iris-virginica 0 0 1 0.375 0.451612903225806 0.113990321158717 98 6.2 2.9 4.3 1.3 Iris-versicolor 0 1 0 0.302325581395349 0.467741935483871 0.166702290969475 99 6.2 3.4 5.4 2.3 Iris-virginica 0 0 1 0.425925925925926 0.548387096774194 0.0811783736332713 100 6.3 2.3 4.4 1.3 Iris-versicolor 0 1 0 0.295454545454545 0.365079365079365 0.138329002279972
Rows: 1-100 of 150 | Columns: 11

Let's create a model to classify the Iris virginica.

In [21]:
```predictors = ["PetalLengthCm", "SepalLengthCm", "SepalWidthCm",
"PetalWidthCm", "ratio_pwl", "ratio_swl"]
response = "Species_Iris-virginica"
model = LinearSVC("svc_virginica_iris")
cross_validate(model, iris, predictors, response)
```

Out[21]:
 auc prc_auc accuracy log_loss precision recall f1_score mcc informedness markedness csi time 1-fold 0.9883333333333334 0.9809541204839196 0.96 0.124753596556893 0.9090909090909091 1.0 0.9523809523809523 0.9211323729436766 0.9333333333333333 0.9090909090909092 0.9090909090909091 1.9328792095184326 2-fold 0.9961904761904762 0.9910912698412698 0.98 0.0803143364420691 0.9375 1.0 0.967741935483871 0.9543135154205278 0.9714285714285715 0.9375 0.9375 1.7381629943847656 3-fold 1.0 0.9999999999999999 1.0 0.0850368152073147