Iris

This example uses the 'iris' dataset to predict the species of various flowers based on their physical features. You can download the Jupyter Notebook of the study here.

  • PetalLengthCm: Petal Length in cm
  • PetalWidthCm: Petal Width in cm
  • SepalLengthCm: Sepal Length in cm
  • SepalWidthCm: Sepal Width in cm
  • Species: The Flower Species (Setosa, Virginica, Versicolor)

We will follow the data science cycle (Data Exploration - Data Preparation - Data Modeling - Model Evaluation - Model Deployment) to solve this problem.

Initialization

This example uses the following version of VerticaPy:

In [24]:
import verticapy as vp
vp.__version__
Out[24]:
'0.9.0'

Connect to Vertica. This example uses an existing connection called "VerticaDSN." For details on how to create a connection, use see the connection tutorial.

In [1]:
vp.connect("VerticaDSN")

Let's create a Virtual DataFrame of the dataset.

In [11]:
from verticapy.datasets import load_iris
import verticapy.stats as st
iris = load_iris()
iris.head(5)
Out[11]:
123
SepalLengthCm
Numeric(5,2)
123
SepalWidthCm
Numeric(5,2)
123
PetalLengthCm
Numeric(5,2)
123
PetalWidthCm
Numeric(5,2)
Abc
Species
Varchar(30)
14.33.01.10.1Iris-setosa
24.42.91.40.2Iris-setosa
34.43.01.30.2Iris-setosa
44.43.21.30.2Iris-setosa
54.52.31.30.3Iris-setosa
Rows: 1-5 | Columns: 5

Data Exploration and Preparation

Let's explore the data by displaying descriptive statistics of all the columns.

In [12]:
iris.describe(method = "categorical", unique=True)
Out[12]:
dtype
count
top
top_percent
unique
"SepalLengthCm"numeric(5,2)1505.06.66735.0
"SepalWidthCm"numeric(5,2)1503.017.33323.0
"PetalLengthCm"numeric(5,2)1501.59.33343.0
"PetalWidthCm"numeric(5,2)1500.218.66722.0
"Species"varchar(30)150Iris-setosa33.3333.0
Rows: 1-5 | Columns: 6

We don't have much data here, but that's okay; since different flower species have different proportions and ratios between those proportions, we can start by making ratios between each feature.

We'll need to use the One-Hot Encoder on the 'Species' to get information about each species.

In [13]:
iris["Species"].one_hot_encode(drop_first = False)
iris["ratio_pwl"] = iris["PetalWidthCm"] / iris["PetalLengthCm"]
iris["ratio_swl"] = iris["SepalWidthCm"] / iris["SepalLengthCm"]

We can draw the correlation matrix (Pearson correlation coefficient) of the new features to see if there are some linear links.

In [14]:
%matplotlib inline
iris.corr()
Out[14]:
"SepalLengthCm"
"SepalWidthCm"
"PetalLengthCm"
"PetalWidthCm"
"Species_Iris-setosa"
"Species_Iris-versicolor"
"Species_Iris-virginica"
"ratio_pwl"
"ratio_swl"
"SepalLengthCm"1.0-0.1093692499506560.8717541573048860.817953633369181-0.7174156686861110.07939552384344440.638020144842660.645854809352185-0.724085081370438
"SepalWidthCm"-0.1093692499506561.0-0.420516096401169-0.3565440896138120.595600845226849-0.464699560561606-0.13090128466524-0.3398548658934460.755415996230207
"PetalLengthCm"0.871754157304886-0.4205160964011691.00.962757097050968-0.9226883328831090.2015867595375060.7211015733456010.8129592586072-0.867296724293694
"PetalWidthCm"0.817953633369181-0.3565440896138120.9627570970509681.0-0.8875099587826580.1183759791393070.769133979643350.910838299356941-0.796309230951869
"Species_Iris-setosa"-0.7174156686861110.595600845226849-0.922688332883109-0.8875099587826581.0-0.5-0.5-0.8251497787065520.907112624976622
"Species_Iris-versicolor"0.0793955238434444-0.4646995605616060.2015867595375060.118375979139307-0.51.0-0.50.212967593449348-0.409714538856098
"Species_Iris-virginica"0.63802014484266-0.130901284665240.7211015733456010.76913397964335-0.5-0.51.00.612182185257205-0.497398086120528
"ratio_pwl"0.645854809352185-0.3398548658934460.81295925860720.910838299356941-0.8251497787065520.2129675934493480.6121821852572051.0-0.689720454015874
"ratio_swl"-0.7240850813704380.755415996230207-0.867296724293694-0.7963092309518690.907112624976622-0.409714538856098-0.497398086120528-0.6897204540158741.0
Rows: 1-9 | Columns: 10

The Iris setosa is highly linearly correlated with the petal length and the sepal ratio. We can see a perfect separation using the two features (though we can also see this separation the petal length alone).

In [15]:
iris.scatter(columns = ["PetalLengthCm", "ratio_swl"], 
             catcol = "Species")
Out[15]:
<AxesSubplot:xlabel='"PetalLengthCm"', ylabel='"ratio_swl"'>

We can we a clear linear separation between the Iris setosa and the other species, but we'll need more features to identify the differences between Iris virginica and Iris versicolor.

In [16]:
iris.scatter(columns = ["PetalLengthCm", 
                        "PetalWidthCm", 
                        "SepalLengthCm"], 
             catcol = "Species")
Out[16]:
<Axes3DSubplot:xlabel='"PetalLengthCm"', ylabel='"PetalWidthCm"'>

Our strategy is simple: we'll use two Linear Support Vector Classification (SVC): one to classify the Iris setosa and another to classify the Iris versicolor.

Machine Learning

Let's build the first Linear SVC to predict if a flower is an Iris setosa.

In [17]:
from verticapy.learn.svm import LinearSVC
from verticapy.learn.model_selection import cross_validate

predictors = ["PetalLengthCm", "ratio_swl"]
response = "Species_Iris-setosa"
model = LinearSVC("svc_setosa_iris")
cross_validate(model, iris, predictors, response)

Out[17]:
auc
prc_auc
accuracy
log_loss
precision
recall
f1_score
mcc
informedness
markedness
csi
time
1-fold1.00.99999999999999991.00.08521359910982651.01.01.01.01.01.01.01.638392686843872
2-fold1.00.99999999999999991.00.08089434789211331.01.01.01.01.01.01.01.5792129039764404
3-fold0.99999999999999991.01.00.07443785764928421.01.01.01.01.01.01.01.4164810180664062
avg1.00.99999999999999991.00.080181934883741341.01.01.01.01.01.01.01.544695536295573
std6.409875621278546e-176.409875621278546e-170.00.0054230803264367870.00.00.00.00.00.00.00.11491206937746823
Rows: 1-5 | Columns: 13

Our model is excellent. Let's build it using the entire dataset.

In [18]:
model.fit(iris, predictors, response)
Out[18]:

=======
details
=======
  predictor  |coefficient
-------------+-----------
  Intercept  |  1.38349  
petallengthcm| -0.84012  
  ratio_swl  |  1.32517  


===========
call_string
===========
SELECT svm_classifier('public.svc_setosa_iris', '"public"."_verticapy_tmp_view_dbadmin_40328_9186882012_"', '"species_iris-setosa"', '"PetalLengthCm", "ratio_swl"'
USING PARAMETERS class_weights='1,1', C=1, max_iterations=100, intercept_mode='regularized', intercept_scaling=1, epsilon=0.0001);

===============
Additional Info
===============
       Name       |Value
------------------+-----
accepted_row_count| 150 
rejected_row_count|  0  
 iteration_count  |  7  

Let's plot the model to see the perfect separation.

In [19]:
model.plot()
Out[19]:
<AxesSubplot:xlabel='"PetalLengthCm"', ylabel='"ratio_swl"'>

We can add this probability to the vDataFrame.

In [20]:
model.predict_proba(iris, name = "setosa", pos_label=1)
Out[20]:
123
SepalLengthCm
Numeric(5,2)
123
SepalWidthCm
Numeric(5,2)
123
PetalLengthCm
Numeric(5,2)
123
PetalWidthCm
Numeric(5,2)
Abc
Species
Varchar(30)
123
Species_Iris-setosa
Bool
123
Species_Iris-versicolor
Bool
123
Species_Iris-virginica
Bool
123
ratio_pwl
Numeric(20,15)
123
ratio_swl
Numeric(20,15)
123
setosa
Float
14.33.01.10.1Iris-setosa1000.0909090909090910.6976744186046510.799616133475824
24.42.91.40.2Iris-setosa1000.1428571428571430.6590909090909090.746632427676863
34.43.01.30.2Iris-setosa1000.1538461538461540.6818181818181820.767609329878423
44.43.21.30.2Iris-setosa1000.1538461538461540.7272727272727270.778180760876583
54.52.31.30.3Iris-setosa1000.2307692307692310.5111111111111110.724849376339106
64.63.11.50.2Iris-setosa1000.1333333333333330.6739130434782610.734263300128653
74.63.21.40.2Iris-setosa1000.1428571428571430.6956521739130430.755687832710383
84.63.41.40.3Iris-setosa1000.2142857142857140.7391304347826090.766167839545688
94.63.61.00.2Iris-setosa1000.20.7826086956521740.82926980908022
104.73.21.30.2Iris-setosa1000.1538461538461540.6808510638297870.76738063375719
114.73.21.60.2Iris-setosa1000.1250.6808510638297870.719411418945644
124.83.01.40.1Iris-setosa1000.0714285714285710.6250.737991521269184
134.83.01.40.3Iris-setosa1000.2142857142857140.6250.737991521269184
144.83.11.60.2Iris-setosa1000.1250.6458333333333330.709949644614095
154.83.41.60.2Iris-setosa1000.1250.7083333333333330.726703765068597
164.83.41.90.2Iris-setosa1000.1052631578947370.7083333333333330.673910864793945
174.92.43.31.0Iris-versicolor0100.3030303030303030.4897959183673470.323039614860416
184.92.54.51.7Iris-virginica0010.3777777777777780.5102040816326530.151750696077398
194.93.01.40.2Iris-setosa1000.1428571428571430.6122448979591840.734710097493698
204.93.11.50.1Iris-setosa1000.0666666666666670.632653061224490.723459090938328
214.93.11.50.1Iris-setosa1000.0666666666666670.632653061224490.723459090938328
224.93.11.50.1Iris-setosa1000.0666666666666670.632653061224490.723459090938328
235.02.03.51.0Iris-versicolor0100.2857142857142860.40.263694030318376
245.02.33.31.0Iris-versicolor0100.3030303030303030.460.314465933883348
255.03.01.60.2Iris-setosa1000.1250.60.697285021121722
265.03.21.20.2Iris-setosa1000.1666666666666670.640.772671290753683
275.03.31.40.2Iris-setosa1000.1428571428571430.660.746860255947267
285.03.41.50.2Iris-setosa1000.1333333333333330.680.735834215605851
295.03.41.60.4Iris-setosa1000.250.680.719183705655739
305.03.51.30.3Iris-setosa1000.2307692307692310.70.771879602615583
315.03.51.60.6Iris-setosa1000.3750.70.724505067474212
325.03.61.40.2Iris-setosa1000.1428571428571430.720.761595468468724
335.12.53.01.1Iris-versicolor0100.3666666666666670.4901960784313730.380536424820992
345.13.31.70.5Iris-setosa1000.2941176470588240.6470588235294120.692695552384892
355.13.41.50.2Iris-setosa1000.1333333333333330.6666666666666670.732385408664398
365.13.51.40.2Iris-setosa1000.1428571428571430.6862745098039220.753386230741964
375.13.51.40.3Iris-setosa1000.2142857142857140.6862745098039220.753386230741964
385.13.71.50.4Iris-setosa1000.2666666666666670.7254901960784310.747384308490649
395.13.81.50.3Iris-setosa1000.20.7450980392156860.752258455681401
405.13.81.60.2Iris-setosa1000.1250.7450980392156860.736272141535994
415.13.81.90.4Iris-setosa1000.2105263157894740.7450980392156860.68452518034775
425.22.73.91.4Iris-versicolor0100.3589743589743590.5192307692307690.230604211159528
435.23.41.40.2Iris-setosa1000.1428571428571430.6538461538461540.745315390833046
445.23.51.50.2Iris-setosa1000.1333333333333330.6730769230769230.734047050003793
455.24.11.50.1Iris-setosa1000.0666666666666670.7884615384615380.762811887474022
465.33.71.50.2Iris-setosa1000.1333333333333330.698113207547170.740473501806866
475.43.04.51.5Iris-versicolor0100.3333333333333330.5555555555555560.159649649102876
485.43.41.50.4Iris-setosa1000.2666666666666670.629629629629630.722656797169744
495.43.41.70.2Iris-setosa1000.1176470588235290.629629629629630.687757248197022
505.43.71.50.2Iris-setosa1000.1333333333333330.6851851851851850.737167698365962
515.43.91.30.4Iris-setosa1000.3076923076923080.7222222222222220.777023333634539
525.43.91.70.4Iris-setosa1000.2352941176470590.7222222222222220.713482010522215
535.52.34.01.3Iris-versicolor0100.3250.4181818181818180.194219643823906
545.52.43.71.0Iris-versicolor0100.270270270270270.4363636363636360.241093824269382
555.52.43.81.1Iris-versicolor0100.2894736842105260.4363636363636360.226058264210938
565.52.54.01.3Iris-versicolor0100.3250.4545454545454550.201872265775898
575.52.64.41.2Iris-versicolor0100.2727272727272730.4727272727272730.156225228875569
585.53.51.30.2Iris-setosa1000.1538461538461540.6363636363636360.756691610416908
595.54.21.40.2Iris-setosa1000.1428571428571430.7636363636363640.771935440609133
605.62.53.91.1Iris-versicolor0100.2820512820512820.4464285714285710.213933207860434
615.62.74.21.3Iris-versicolor0100.309523809523810.4821428571428570.181519581637256
625.62.84.92.0Iris-virginica0010.4081632653061220.50.111996799091321
635.62.93.61.3Iris-versicolor0100.3611111111111110.5178571428571430.277943121171659
645.63.04.11.3Iris-versicolor0100.3170731707317070.5357142857142860.205693392743854
655.63.04.51.5Iris-versicolor0100.3333333333333330.5357142857142860.156153617405426
665.72.55.02.0Iris-virginica0010.40.438596491228070.0965737982202029
675.72.63.51.0Iris-versicolor0100.2857142857142860.4561403508771930.278390203317582
685.72.84.11.3Iris-versicolor0100.3170731707317070.4912280701754390.196228591121515
695.72.84.51.3Iris-versicolor0100.2888888888888890.4912280701754390.148542053943545
705.72.94.21.3Iris-versicolor0100.309523809523810.5087719298245610.186821367793599
715.73.04.21.2Iris-versicolor0100.2857142857142860.5263157894736840.190379016686771
725.73.81.70.3Iris-setosa1000.1764705882352940.6666666666666670.698198801634805
735.74.41.50.4Iris-setosa1000.2666666666666670.7719298245614040.758825404150609
745.82.64.01.2Iris-versicolor0100.30.4482758620689660.200536953059694
755.82.73.91.2Iris-versicolor0100.3076923076923080.465517241379310.218217860142616
765.82.74.11.0Iris-versicolor0100.243902439024390.465517241379310.190910337541694
775.82.75.11.9Iris-virginica0010.3725490196078430.465517241379310.0924379577314657
785.82.75.11.9Iris-virginica0010.3725490196078430.465517241379310.0924379577314657
795.82.85.12.4Iris-virginica0010.4705882352941180.4827586206896550.0943726599781477
805.84.01.20.2Iris-setosa1000.1666666666666670.6896551724137930.784021608790257
815.93.04.21.5Iris-versicolor0100.3571428571428570.5084745762711860.186761512366878
825.93.05.11.8Iris-virginica0010.3529411764705880.5084745762711860.0973257207195747
835.93.24.81.8Iris-versicolor0100.3750.5423728813559320.126712735034151
846.02.24.01.0Iris-versicolor0100.250.3666666666666670.183758509548148
856.02.25.01.5Iris-virginica0010.30.3666666666666670.0885712440311769
866.02.75.11.6Iris-versicolor0100.3137254901960780.450.0907272611961409
876.02.94.51.5Iris-versicolor0100.3333333333333330.4833333333333330.147223724664182
886.03.04.81.8Iris-virginica0010.3750.50.120628357257542
896.03.44.51.6Iris-versicolor0100.3555555555555560.5666666666666670.16163497279113
906.12.65.61.4Iris-virginica0010.250.4262295081967210.0597292318554775
916.12.84.01.3Iris-versicolor0100.3250.4590163934426230.202828547374176
926.12.84.71.2Iris-versicolor0100.255319148936170.4590163934426230.123814688224948
936.12.94.71.4Iris-versicolor0100.2978723404255320.4754098360655740.126190742362697
946.13.04.61.4Iris-versicolor0100.3043478260869570.4918032786885250.138317919091645
956.13.04.91.8Iris-virginica0010.367346938775510.4918032786885250.110921074597917
966.22.24.51.5Iris-versicolor0100.3333333333333330.3548387096774190.127102859928249
976.22.84.81.8Iris-virginica0010.3750.4516129032258060.113990321158717
986.22.94.31.3Iris-versicolor0100.3023255813953490.4677419354838710.166702290969475
996.23.45.42.3Iris-virginica0010.4259259259259260.5483870967741940.0811783736332713
1006.32.34.41.3Iris-versicolor0100.2954545454545450.3650793650793650.138329002279972
Rows: 1-100 of 150 | Columns: 11

Let's create a model to classify the Iris virginica.

In [21]:
predictors = ["PetalLengthCm", "SepalLengthCm", "SepalWidthCm", 
              "PetalWidthCm", "ratio_pwl", "ratio_swl"]
response = "Species_Iris-virginica"
model = LinearSVC("svc_virginica_iris")
cross_validate(model, iris, predictors, response)

Out[21]:
auc
prc_auc
accuracy
log_loss
precision
recall
f1_score
mcc
informedness
markedness
csi
time
1-fold0.98833333333333340.98095412048391960.960.1247535965568930.90909090909090911.00.95238095238095230.92113237294367660.93333333333333330.90909090909090920.90909090909090911.9328792095184326
2-fold0.99619047619047620.99109126984126980.980.08031433644206910.93751.00.9677419354838710.95431351542052780.97142857142857150.93750.93751.7381629943847656
3-fold1.00.99999999999999991.00.08503681520731471.01.01.01.01.01.01.01.480771780014038
avg0.99484126984126990.99068179677506310.980.09670158273542560.94886363636363641.00.97337429595494110.95848196278806810.96825396825396830.94886363636363640.94886363636363641.717271327972412
std0.0059492055021936670.0095295400087949230.0200000000000000180.024408237564435030.046507686203820750.00.024304033356489150.0395987072322543860.033446519850326150.04650768620382070.046507686203820750.2267766042769438
Rows: 1-5 | Columns: 13

We have another excellent model. Let's add it to the vDataFrame.

In [22]:
model.fit(iris, predictors, response)
model.predict_proba(iris, name = "virginica", pos_label=1)
Out[22]:
123
SepalLengthCm
Numeric(5,2)
123
SepalWidthCm
Numeric(5,2)
123
PetalLengthCm
Numeric(5,2)
123
PetalWidthCm
Numeric(5,2)
Abc
Species
Varchar(30)
123
Species_Iris-setosa
Bool
123
Species_Iris-versicolor
Bool
123
Species_Iris-virginica
Bool
123
ratio_pwl
Numeric(20,15)
123
ratio_swl
Numeric(20,15)
123
setosa
Float
123
virginica
Float
14.33.01.10.1Iris-setosa1000.0909090909090910.6976744186046510.7996161334758240.00122433036640019
24.42.91.40.2Iris-setosa1000.1428571428571430.6590909090909090.7466324276768630.00227047297381315
34.43.01.30.2Iris-setosa1000.1538461538461540.6818181818181820.7676093298784230.00178314462746812
44.43.21.30.2Iris-setosa1000.1538461538461540.7272727272727270.7781807608765830.00145150073932887
54.52.31.30.3Iris-setosa1000.2307692307692310.5111111111111110.7248493763391060.00403116120042452
64.63.11.50.2Iris-setosa1000.1333333333333330.6739130434782610.7342633001286530.00183033491108253
74.63.21.40.2Iris-setosa1000.1428571428571430.6956521739130430.7556878327103830.00143889895527319
84.63.41.40.3Iris-setosa1000.2142857142857140.7391304347826090.7661678395456880.0013969736269709
94.63.61.00.2Iris-setosa1000.20.7826086956521740.829269809080220.000547208473932473
104.73.21.30.2Iris-setosa1000.1538461538461540.6808510638297870.767380633757190.00116230418864846
114.73.21.60.2Iris-setosa1000.1250.6808510638297870.7194114189456440.00176145645915221
124.83.01.40.1Iris-setosa1000.0714285714285710.6250.7379915212691840.00127412454466533
134.83.01.40.3Iris-setosa1000.2142857142857140.6250.7379915212691840.00180671060039961
144.83.11.60.2Iris-setosa1000.1250.6458333333333330.7099496446140950.00180831675393962
154.83.41.60.2Iris-setosa1000.1250.7083333333333330.7267037650685970.00133419771185144
164.83.41.90.2Iris-setosa1000.1052631578947370.7083333333333330.6739108647939450.00201826540187351
174.92.43.31.0Iris-versicolor0100.3030303030303030.4897959183673470.3230396148604160.132499457324138
184.92.54.51.7Iris-virginica0010.3777777777777780.5102040816326530.1517506960773980.72485501584532
194.93.01.40.2Iris-setosa1000.1428571428571430.6122448979591840.7347100974936980.00140585112450545
204.93.11.50.1Iris-setosa1000.0666666666666670.632653061224490.7234590909383280.00122460228368006
214.93.11.50.1Iris-setosa1000.0666666666666670.632653061224490.7234590909383280.00122460228368006
224.93.11.50.1Iris-setosa1000.0666666666666670.632653061224490.7234590909383280.00122460228368006
235.02.03.51.0Iris-versicolor0100.2857142857142860.40.2636940303183760.218011537158091
245.02.33.31.0Iris-versicolor0100.3030303030303030.460.3144659338833480.13505447670823
255.03.01.60.2Iris-setosa1000.1250.60.6972850211217220.00171743979982548
265.03.21.20.2Iris-setosa1000.1666666666666670.640.7726712907536830.000806016135963101
275.03.31.40.2Iris-setosa1000.1428571428571430.660.7468602559472670.000962511554589033
285.03.41.50.2Iris-setosa1000.1333333333333330.680.7358342156058510.000999680008261747
295.03.41.60.4Iris-setosa1000.250.680.7191837056557390.00163358629729379
305.03.51.30.3Iris-setosa1000.2307692307692310.70.7718796026155830.000814701577423369
315.03.51.60.6Iris-setosa1000.3750.70.7245050674742120.0021015454086499
325.03.61.40.2Iris-setosa1000.1428571428571430.720.7615954684687240.00071144094055916
335.12.53.01.1Iris-versicolor0100.3666666666666670.4901960784313730.3805364248209920.0853428903860081
345.13.31.70.5Iris-setosa1000.2941176470588240.6470588235294120.6926955523848920.00229740383015431
355.13.41.50.2Iris-setosa1000.1333333333333330.6666666666666670.7323854086643980.000926675820385833
365.13.51.40.2Iris-setosa1000.1428571428571430.6862745098039220.7533862307419640.000729620173422534
375.13.51.40.3Iris-setosa1000.2142857142857140.6862745098039220.7533862307419640.000868941345173395
385.13.71.50.4Iris-setosa1000.2666666666666670.7254901960784310.7473843084906490.000974197335850832
395.13.81.50.3Iris-setosa1000.20.7450980392156860.7522584556814010.000739172521833558
405.13.81.60.2Iris-setosa1000.1250.7450980392156860.7362721415359940.000712150840339464
415.13.81.90.4Iris-setosa1000.2105263157894740.7450980392156860.684525180347750.00153921993636545
425.22.73.91.4Iris-versicolor0100.3589743589743590.5192307692307690.2306042111595280.299060581429912
435.23.41.40.2Iris-setosa1000.1428571428571430.6538461538461540.7453153908330460.000747474055504667
445.23.51.50.2Iris-setosa1000.1333333333333330.6730769230769230.7340470500037930.000776831226507719
455.24.11.50.1Iris-setosa1000.0666666666666670.7884615384615380.7628118874740220.000357305074739078
465.33.71.50.2Iris-setosa1000.1333333333333330.698113207547170.7404735018068660.000589434785190531
475.43.04.51.5Iris-versicolor0100.3333333333333330.5555555555555560.1596496491028760.427165222857912
485.43.41.50.4Iris-setosa1000.2666666666666670.629629629629630.7226567971697440.00104624375733825
495.43.41.70.2Iris-setosa1000.1176470588235290.629629629629630.6877572481970220.000970986043692887
505.43.71.50.2Iris-setosa1000.1333333333333330.6851851851851850.7371676983659620.000546193177329702
515.43.91.30.4Iris-setosa1000.3076923076923080.7222222222222220.7770233336345390.000479938697048962
525.43.91.70.4Iris-setosa1000.2352941176470590.7222222222222220.7134820105222150.00084126466387846
535.52.34.01.3Iris-versicolor0100.3250.4181818181818180.1942196438239060.323725428209635
545.52.43.71.0Iris-versicolor0100.270270270270270.4363636363636360.2410938242693820.141598148172761
555.52.43.81.1Iris-versicolor0100.2894736842105260.4363636363636360.2260582642109380.18538879318243
565.52.54.01.3Iris-versicolor0100.3250.4545454545454550.2018722657758980.281828099842536
575.52.64.41.2Iris-versicolor0100.2727272727272730.4727272727272730.1562252288755690.339490248152352
585.53.51.30.2Iris-setosa1000.1538461538461540.6363636363636360.7566916104169080.00046747181604766
595.54.21.40.2Iris-setosa1000.1428571428571430.7636363636363640.7719354406091330.00026803072650644
605.62.53.91.1Iris-versicolor0100.2820512820512820.4464285714285710.2139332078604340.179254887741374
615.62.74.21.3Iris-versicolor0100.309523809523810.4821428571428570.1815195816372560.281525960598863
625.62.84.92.0Iris-virginica0010.4081632653061220.50.1119967990913210.772699535996092
635.62.93.61.3Iris-versicolor0100.3611111111111110.5178571428571430.2779431211716590.12275570891685
645.63.04.11.3Iris-versicolor0100.3170731707317070.5357142857142860.2056933927438540.202206975027365
655.63.04.51.5Iris-versicolor0100.3333333333333330.5357142857142860.1561536174054260.389171841668785
665.72.55.02.0Iris-virginica0010.40.438596491228070.09657379822020290.829063166160933
675.72.63.51.0Iris-versicolor0100.2857142857142860.4561403508771930.2783902033175820.0803471665429827
685.72.84.11.3Iris-versicolor0100.3170731707317070.4912280701754390.1962285911215150.222008866778018
695.72.84.51.3Iris-versicolor0100.2888888888888890.4912280701754390.1485420539435450.331624037403293
705.72.94.21.3Iris-versicolor0100.309523809523810.5087719298245610.1868213677935990.22891483989967
715.73.04.21.2Iris-versicolor0100.2857142857142860.5263157894736840.1903790166867710.182830591635799
725.73.81.70.3Iris-setosa1000.1764705882352940.6666666666666670.6981988016348050.000618518864722319
735.74.41.50.4Iris-setosa1000.2666666666666670.7719298245614040.7588254041506090.000308602083920526
745.82.64.01.2Iris-versicolor0100.30.4482758620689660.2005369530596940.188632750996593
755.82.73.91.2Iris-versicolor0100.3076923076923080.465517241379310.2182178601426160.154999178584478
765.82.74.11.0Iris-versicolor0100.243902439024390.465517241379310.1909103375416940.14344683247468
775.82.75.11.9Iris-virginica0010.3725490196078430.465517241379310.09243795773146570.778138872651123
785.82.75.11.9Iris-virginica0010.3725490196078430.465517241379310.09243795773146570.778138872651123
795.82.85.12.4Iris-virginica0010.4705882352941180.4827586206896550.09437265997814770.88897376885522
805.84.01.20.2Iris-setosa1000.1666666666666670.6896551724137930.7840216087902570.000196722322892113
815.93.04.21.5Iris-versicolor0100.3571428571428570.5084745762711860.1867615123668780.248939906475404
825.93.05.11.8Iris-virginica0010.3529411764705880.5084745762711860.09732572071957470.667002301728646
835.93.24.81.8Iris-versicolor0100.3750.5423728813559320.1267127350341510.520718099985326
846.02.24.01.0Iris-versicolor0100.250.3666666666666670.1837585095481480.168632856142423
856.02.25.01.5Iris-virginica0010.30.3666666666666670.08857124403117690.669797669663903
866.02.75.11.6Iris-versicolor0100.3137254901960780.450.09072726119614090.631705797918441
876.02.94.51.5Iris-versicolor0100.3333333333333330.4833333333333330.1472237246641820.338352835710488
886.03.04.81.8Iris-virginica0010.3750.50.1206283572575420.549774143394639
896.03.44.51.6Iris-versicolor0100.3555555555555560.5666666666666670.161634972791130.273500092264149
906.12.65.61.4Iris-virginica0010.250.4262295081967210.05972923185547750.706086784980859
916.12.84.01.3Iris-versicolor0100.3250.4590163934426230.2028285473741760.152739424556609
926.12.84.71.2Iris-versicolor0100.255319148936170.4590163934426230.1238146882249480.282939328372567
936.12.94.71.4Iris-versicolor0100.2978723404255320.4754098360655740.1261907423626970.340998054315568
946.13.04.61.4Iris-versicolor0100.3043478260869570.4918032786885250.1383179190916450.29009447753533
956.13.04.91.8Iris-virginica0010.367346938775510.4918032786885250.1109210745979170.564162617430953
966.22.24.51.5Iris-versicolor0100.3333333333333330.3548387096774190.1271028599282490.463142487712036
976.22.84.81.8Iris-virginica0010.3750.4516129032258060.1139903211587170.558414695932799
986.22.94.31.3Iris-versicolor0100.3023255813953490.4677419354838710.1667022909694750.185951247356437
996.23.45.42.3Iris-virginica0010.4259259259259260.5483870967741940.08117837363327130.802710629153816
1006.32.34.41.3Iris-versicolor0100.2954545454545450.3650793650793650.1383290022799720.302766695527805
Rows: 1-100 of 150 | Columns: 12

Let's evaluate our final model (the combination of two Linear SVCs).

In [23]:
iris.case_when("prediction",
               iris["setosa"] > 0.5, 'Iris-setosa',
               iris["virginica"] > 0.5, 'Iris-virginica',
               'Iris-versicolor')
iris["score"] = (iris["Species"] == iris["prediction"])
iris["score"].avg()
Out[23]:
0.98

We have a great model with an accuracy of 98% on an entirely balanced dataset.

Conclusion

We've solved our problem in a Pandas-like way, all without ever loading data into memory!


VerticaPy

Python API for Vertica Data Science at Scale

About the Author


Badr Ouali
Head of Data Science


Badr Ouali works as a Lead Data Scientist for Vertica worldwide. He can embrace data projects end to end through a clear understanding of the “big picture” as well as attention to details, resulting in achieving great business outcomes – a distinctive differentiator in his role. Badr enjoys sharing knowledge and insights related to data analytics with colleagues & peers and has a sweet spot for Python. He loves helping customers finding the best value from their data and empower them to solve their use-cases.