VerticaPy

Python API for Vertica Data Science at Scale

Multiclass Classification

This example demonstrates various Multiclass Classifier methods using the Iris dataset.

In [1]:
from verticapy.learn.datasets import load_iris
iris = load_iris()
display(iris)
123
SepalLengthCm
Numeric(5,2)
123
SepalWidthCm
Numeric(5,2)
123
PetalLengthCm
Numeric(5,2)
123
PetalWidthCm
Numeric(5,2)
Abc
Species
Varchar(30)
14.303.001.100.10Iris-setosa
24.402.901.400.20Iris-setosa
34.403.001.300.20Iris-setosa
44.403.201.300.20Iris-setosa
54.502.301.300.30Iris-setosa
64.603.101.500.20Iris-setosa
74.603.201.400.20Iris-setosa
84.603.401.400.30Iris-setosa
94.603.601.000.20Iris-setosa
104.703.201.300.20Iris-setosa
114.703.201.600.20Iris-setosa
124.803.001.400.10Iris-setosa
134.803.001.400.30Iris-setosa
144.803.101.600.20Iris-setosa
154.803.401.600.20Iris-setosa
164.803.401.900.20Iris-setosa
174.902.403.301.00Iris-versicolor
184.902.504.501.70Iris-virginica
194.903.001.400.20Iris-setosa
204.903.101.500.10Iris-setosa
214.903.101.500.10Iris-setosa
224.903.101.500.10Iris-setosa
235.002.003.501.00Iris-versicolor
245.002.303.301.00Iris-versicolor
255.003.001.600.20Iris-setosa
265.003.201.200.20Iris-setosa
275.003.301.400.20Iris-setosa
285.003.401.500.20Iris-setosa
295.003.401.600.40Iris-setosa
305.003.501.300.30Iris-setosa
315.003.501.600.60Iris-setosa
325.003.601.400.20Iris-setosa
335.102.503.001.10Iris-versicolor
345.103.301.700.50Iris-setosa
355.103.401.500.20Iris-setosa
365.103.501.400.20Iris-setosa
375.103.501.400.30Iris-setosa
385.103.701.500.40Iris-setosa
395.103.801.500.30Iris-setosa
405.103.801.600.20Iris-setosa
415.103.801.900.40Iris-setosa
425.202.703.901.40Iris-versicolor
435.203.401.400.20Iris-setosa
445.203.501.500.20Iris-setosa
455.204.101.500.10Iris-setosa
465.303.701.500.20Iris-setosa
475.403.004.501.50Iris-versicolor
485.403.401.500.40Iris-setosa
495.403.401.700.20Iris-setosa
505.403.701.500.20Iris-setosa
515.403.901.300.40Iris-setosa
525.403.901.700.40Iris-setosa
535.502.304.001.30Iris-versicolor
545.502.403.701.00Iris-versicolor
555.502.403.801.10Iris-versicolor
565.502.504.001.30Iris-versicolor
575.502.604.401.20Iris-versicolor
585.503.501.300.20Iris-setosa
595.504.201.400.20Iris-setosa
605.602.503.901.10Iris-versicolor
615.602.704.201.30Iris-versicolor
625.602.804.902.00Iris-virginica
635.602.903.601.30Iris-versicolor
645.603.004.101.30Iris-versicolor
655.603.004.501.50Iris-versicolor
665.702.505.002.00Iris-virginica
675.702.603.501.00Iris-versicolor
685.702.804.101.30Iris-versicolor
695.702.804.501.30Iris-versicolor
705.702.904.201.30Iris-versicolor
715.703.004.201.20Iris-versicolor
725.703.801.700.30Iris-setosa
735.704.401.500.40Iris-setosa
745.802.604.001.20Iris-versicolor
755.802.703.901.20Iris-versicolor
765.802.704.101.00Iris-versicolor
775.802.705.101.90Iris-virginica
785.802.705.101.90Iris-virginica
795.802.805.102.40Iris-virginica
805.804.001.200.20Iris-setosa
815.903.004.201.50Iris-versicolor
825.903.005.101.80Iris-virginica
835.903.204.801.80Iris-versicolor
846.002.204.001.00Iris-versicolor
856.002.205.001.50Iris-virginica
866.002.705.101.60Iris-versicolor
876.002.904.501.50Iris-versicolor
886.003.004.801.80Iris-virginica
896.003.404.501.60Iris-versicolor
906.102.605.601.40Iris-virginica
916.102.804.001.30Iris-versicolor
926.102.804.701.20Iris-versicolor
936.102.904.701.40Iris-versicolor
946.103.004.601.40Iris-versicolor
956.103.004.901.80Iris-virginica
966.202.204.501.50Iris-versicolor
976.202.804.801.80Iris-virginica
986.202.904.301.30Iris-versicolor
996.203.405.402.30Iris-virginica
1006.302.304.401.30Iris-versicolor
Rows: 1-100 of 150 | Columns: 5

Let's create a Random Forest to predict the Flower Species.

In [2]:
from verticapy.learn.ensemble import RandomForestClassifier
model = RandomForestClassifier("public.RF_iris")
model.fit("public.iris", 
          ["PetalLengthCm", "SepalWidthCm", "SepalLengthCm", "PetalWidthCm"], 
          "Species")
Out[2]:

===========
call_string
===========
SELECT rf_classifier('public.RF_iris', 'public.iris', '"species"', '"PetalLengthCm", "SepalWidthCm", "SepalLengthCm", "PetalWidthCm"' USING PARAMETERS exclude_columns='', ntree=10, mtry=2, sampling_size=0.632, max_depth=5, max_breadth=1000000000, min_leaf_size=1, min_info_gain=0, nbins=32);

=======
details
=======
  predictor  |      type      
-------------+----------------
petallengthcm|float or numeric
sepalwidthcm |float or numeric
sepallengthcm|float or numeric
petalwidthcm |float or numeric


===============
Additional Info
===============
       Name       |Value
------------------+-----
    tree_count    | 10  
rejected_row_count|  0  
accepted_row_count| 150 

Fitting the model creates new model attributes, which make methods easier to use.

In [3]:
model.X
Out[3]:
['"PetalLengthCm"', '"SepalWidthCm"', '"SepalLengthCm"', '"PetalWidthCm"']
In [4]:
model.y
Out[4]:
'"Species"'
In [5]:
model.input_relation
Out[5]:
'public.iris'
In [6]:
model.test_relation
Out[6]:
'public.iris'

In our case, we did not write the test relation when fitting the model. The model will then consider the training relation as test. These attributes will be used when invoking the different model abstractions. For example, let's compute the accuracy of the model.

In [7]:
model.score(method = "accuracy")
Out[7]:
0.986666666666667

The 'score' method is using the attribute 'y' and the model prediction in the 'testrelation' to compute the accuracy. You can change these at anytime to deploy the models on different columns. The model could also have other useful attributes. In the case of Random Forest, the 'classes' attribute is the list of the response categories.

In [8]:
model.classes_
Out[8]:
['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']

Some Vertica attributes might be useful. You can use the 'get_model_attribute' method to check all the available attributes.

In [18]:
model.get_model_attribute()
Abc
attr_name
Varchar(128)
Abc
attr_fields
Long varchar(32000000)
123
#_of_rows
Integer
1tree_counttree_count1
2rejected_row_countrejected_row_count1
3accepted_row_countaccepted_row_count1
4call_stringcall_string1
5detailspredictor, type4
Out[18]:

Many abstraction can help you drawing model evaluation curves like PRC, ROC or Lift Chart. But as it is a multiclass classification, one of the class must be considered as positive. The parameter pos_label will represent this class. Let's consider the 'Iris-setosa' as the positive class.

In [9]:
model.roc_curve(pos_label = 'Iris-setosa')
model.prc_curve(pos_label = 'Iris-setosa')
model.lift_chart(pos_label = 'Iris-setosa')

Looking at the SQL code can help you understand how Vertica works.

In [10]:
display(model.deploySQL())
PREDICT_RF_CLASSIFIER("PetalLengthCm", "SepalWidthCm", "SepalLengthCm", "PetalWidthCm" USING PARAMETERS model_name = 'public.RF_iris', match_by_pos = 'true')

The classification report is the best way to evaluate your model. In the case of multiclass classification, each of the classes will be considered as the positive one at each round. If no cutoff is informed, optimized cutoffs will be computed.

In [11]:
model.classification_report()
Iris-setosa
Iris-versicolor
Iris-virginica
auc1.00.99980000000000020.9998000000000001
prc_auc1.00.99960392156862740.9996039215686274
accuracy1.00.99333333333333330.9933333333333333
log_loss0.003982889180144740.01924652280070740.0188544198895192
precision1.00.98039215686274510.9803921568627451
recall1.01.01.0
f1_score1.00.99497487437185930.9949748743718593
mcc1.00.9851843661437780.985184366143778
informedness1.00.990.99
markedness1.00.98039215686274520.9803921568627452
csi1.00.98039215686274510.9803921568627451
cutoff0.90.3640.438
Out[11]:

You can also add the prediction to your vDataFrame.

In [12]:
model.predict(iris, name = "pred_Species")
123
SepalLengthCm
Numeric(5,2)
123
SepalWidthCm
Numeric(5,2)
123
PetalLengthCm
Numeric(5,2)
123
PetalWidthCm
Numeric(5,2)
Abc
Species
Varchar(30)
Abc
pred_Species
Varchar(128)
14.303.001.100.10Iris-setosaIris-setosa
24.402.901.400.20Iris-setosaIris-setosa
34.403.001.300.20Iris-setosaIris-setosa
44.403.201.300.20Iris-setosaIris-setosa
54.502.301.300.30Iris-setosaIris-setosa
64.603.101.500.20Iris-setosaIris-setosa
74.603.201.400.20Iris-setosaIris-setosa
84.603.401.400.30Iris-setosaIris-setosa
94.603.601.000.20Iris-setosaIris-setosa
104.703.201.300.20Iris-setosaIris-setosa
114.703.201.600.20Iris-setosaIris-setosa
124.803.001.400.10Iris-setosaIris-setosa
134.803.001.400.30Iris-setosaIris-setosa
144.803.101.600.20Iris-setosaIris-setosa
154.803.401.600.20Iris-setosaIris-setosa
164.803.401.900.20Iris-setosaIris-setosa
174.902.403.301.00Iris-versicolorIris-versicolor
184.902.504.501.70Iris-virginicaIris-virginica
194.903.001.400.20Iris-setosaIris-setosa
204.903.101.500.10Iris-setosaIris-setosa
214.903.101.500.10Iris-setosaIris-setosa
224.903.101.500.10Iris-setosaIris-setosa
235.002.003.501.00Iris-versicolorIris-versicolor
245.002.303.301.00Iris-versicolorIris-versicolor
255.003.001.600.20Iris-setosaIris-setosa
265.003.201.200.20Iris-setosaIris-setosa
275.003.301.400.20Iris-setosaIris-setosa
285.003.401.500.20Iris-setosaIris-setosa
295.003.401.600.40Iris-setosaIris-setosa
305.003.501.300.30Iris-setosaIris-setosa
315.003.501.600.60Iris-setosaIris-setosa
325.003.601.400.20Iris-setosaIris-setosa
335.102.503.001.10Iris-versicolorIris-versicolor
345.103.301.700.50Iris-setosaIris-setosa
355.103.401.500.20Iris-setosaIris-setosa
365.103.501.400.20Iris-setosaIris-setosa
375.103.501.400.30Iris-setosaIris-setosa
385.103.701.500.40Iris-setosaIris-setosa
395.103.801.500.30Iris-setosaIris-setosa
405.103.801.600.20Iris-setosaIris-setosa
415.103.801.900.40Iris-setosaIris-setosa
425.202.703.901.40Iris-versicolorIris-versicolor
435.203.401.400.20Iris-setosaIris-setosa
445.203.501.500.20Iris-setosaIris-setosa
455.204.101.500.10Iris-setosaIris-setosa
465.303.701.500.20Iris-setosaIris-setosa
475.403.004.501.50Iris-versicolorIris-versicolor
485.403.401.500.40Iris-setosaIris-setosa
495.403.401.700.20Iris-setosaIris-setosa
505.403.701.500.20Iris-setosaIris-setosa
515.403.901.300.40Iris-setosaIris-setosa
525.403.901.700.40Iris-setosaIris-setosa
535.502.304.001.30Iris-versicolorIris-versicolor
545.502.403.701.00Iris-versicolorIris-versicolor
555.502.403.801.10Iris-versicolorIris-versicolor
565.502.504.001.30Iris-versicolorIris-versicolor
575.502.604.401.20Iris-versicolorIris-versicolor
585.503.501.300.20Iris-setosaIris-setosa
595.504.201.400.20Iris-setosaIris-setosa
605.602.503.901.10Iris-versicolorIris-versicolor
615.602.704.201.30Iris-versicolorIris-versicolor
625.602.804.902.00Iris-virginicaIris-virginica
635.602.903.601.30Iris-versicolorIris-versicolor
645.603.004.101.30Iris-versicolorIris-versicolor
655.603.004.501.50Iris-versicolorIris-versicolor
665.702.505.002.00Iris-virginicaIris-virginica
675.702.603.501.00Iris-versicolorIris-versicolor
685.702.804.101.30Iris-versicolorIris-versicolor
695.702.804.501.30Iris-versicolorIris-versicolor
705.702.904.201.30Iris-versicolorIris-versicolor
715.703.004.201.20Iris-versicolorIris-versicolor
725.703.801.700.30Iris-setosaIris-setosa
735.704.401.500.40Iris-setosaIris-setosa
745.802.604.001.20Iris-versicolorIris-versicolor
755.802.703.901.20Iris-versicolorIris-versicolor
765.802.704.101.00Iris-versicolorIris-versicolor
775.802.705.101.90Iris-virginicaIris-virginica
785.802.705.101.90Iris-virginicaIris-virginica
795.802.805.102.40Iris-virginicaIris-virginica
805.804.001.200.20Iris-setosaIris-setosa
815.903.004.201.50Iris-versicolorIris-versicolor
825.903.005.101.80Iris-virginicaIris-virginica
835.903.204.801.80Iris-versicolorIris-versicolor
846.002.204.001.00Iris-versicolorIris-versicolor
856.002.205.001.50Iris-virginicaIris-versicolor
866.002.705.101.60Iris-versicolorIris-virginica
876.002.904.501.50Iris-versicolorIris-versicolor
886.003.004.801.80Iris-virginicaIris-virginica
896.003.404.501.60Iris-versicolorIris-versicolor
906.102.605.601.40Iris-virginicaIris-virginica
916.102.804.001.30Iris-versicolorIris-versicolor
926.102.804.701.20Iris-versicolorIris-versicolor
936.102.904.701.40Iris-versicolorIris-versicolor
946.103.004.601.40Iris-versicolorIris-versicolor
956.103.004.901.80Iris-virginicaIris-virginica
966.202.204.501.50Iris-versicolorIris-versicolor
976.202.804.801.80Iris-virginicaIris-virginica
986.202.904.301.30Iris-versicolorIris-versicolor
996.203.405.402.30Iris-virginicaIris-virginica
1006.302.304.401.30Iris-versicolorIris-versicolor
Out[12]:
Rows: 1-100 of 150 | Columns: 6

You can also add the probability of a specific class by using the parameter 'pos_label.'

In [13]:
model.predict(iris, name = "prob_versicolor", pos_label = "Iris-versicolor")
123
SepalLengthCm
Numeric(5,2)
123
SepalWidthCm
Numeric(5,2)
123
PetalLengthCm
Numeric(5,2)
123
PetalWidthCm
Numeric(5,2)
Abc
Species
Varchar(30)
Abc
pred_Species
Varchar(128)
Abc
prob_versicolor
Varchar(128)
14.303.001.100.10Iris-setosaIris-setosa0
24.402.901.400.20Iris-setosaIris-setosa0
34.403.001.300.20Iris-setosaIris-setosa0
44.403.201.300.20Iris-setosaIris-setosa0
54.502.301.300.30Iris-setosaIris-setosa0
64.603.101.500.20Iris-setosaIris-setosa0
74.603.201.400.20Iris-setosaIris-setosa0
84.603.401.400.30Iris-setosaIris-setosa0
94.603.601.000.20Iris-setosaIris-setosa0
104.703.201.300.20Iris-setosaIris-setosa0
114.703.201.600.20Iris-setosaIris-setosa0
124.803.001.400.10Iris-setosaIris-setosa0
134.803.001.400.30Iris-setosaIris-setosa0
144.803.101.600.20Iris-setosaIris-setosa0
154.803.401.600.20Iris-setosaIris-setosa0
164.803.401.900.20Iris-setosaIris-setosa0
174.902.403.301.00Iris-versicolorIris-versicolor0.9
184.902.504.501.70Iris-virginicaIris-virginica0.3
194.903.001.400.20Iris-setosaIris-setosa0
204.903.101.500.10Iris-setosaIris-setosa0
214.903.101.500.10Iris-setosaIris-setosa0
224.903.101.500.10Iris-setosaIris-setosa0
235.002.003.501.00Iris-versicolorIris-versicolor1
245.002.303.301.00Iris-versicolorIris-versicolor1
255.003.001.600.20Iris-setosaIris-setosa0
265.003.201.200.20Iris-setosaIris-setosa0
275.003.301.400.20Iris-setosaIris-setosa0
285.003.401.500.20Iris-setosaIris-setosa0
295.003.401.600.40Iris-setosaIris-setosa0
305.003.501.300.30Iris-setosaIris-setosa0
315.003.501.600.60Iris-setosaIris-setosa0.1
325.003.601.400.20Iris-setosaIris-setosa0
335.102.503.001.10Iris-versicolorIris-versicolor1
345.103.301.700.50Iris-setosaIris-setosa0.1
355.103.401.500.20Iris-setosaIris-setosa0
365.103.501.400.20Iris-setosaIris-setosa0
375.103.501.400.30Iris-setosaIris-setosa0
385.103.701.500.40Iris-setosaIris-setosa0
395.103.801.500.30Iris-setosaIris-setosa0
405.103.801.600.20Iris-setosaIris-setosa0
415.103.801.900.40Iris-setosaIris-setosa0
425.202.703.901.40Iris-versicolorIris-versicolor1
435.203.401.400.20Iris-setosaIris-setosa0
445.203.501.500.20Iris-setosaIris-setosa0
455.204.101.500.10Iris-setosaIris-setosa0
465.303.701.500.20Iris-setosaIris-setosa0
475.403.004.501.50Iris-versicolorIris-versicolor1
485.403.401.500.40Iris-setosaIris-setosa0
495.403.401.700.20Iris-setosaIris-setosa0
505.403.701.500.20Iris-setosaIris-setosa0
515.403.901.300.40Iris-setosaIris-setosa0
525.403.901.700.40Iris-setosaIris-setosa0
535.502.304.001.30Iris-versicolorIris-versicolor1
545.502.403.701.00Iris-versicolorIris-versicolor1
555.502.403.801.10Iris-versicolorIris-versicolor1
565.502.504.001.30Iris-versicolorIris-versicolor1
575.502.604.401.20Iris-versicolorIris-versicolor1
585.503.501.300.20Iris-setosaIris-setosa0
595.504.201.400.20Iris-setosaIris-setosa0
605.602.503.901.10Iris-versicolorIris-versicolor1
615.602.704.201.30Iris-versicolorIris-versicolor1
625.602.804.902.00Iris-virginicaIris-virginica0.01
635.602.903.601.30Iris-versicolorIris-versicolor1
645.603.004.101.30Iris-versicolorIris-versicolor1
655.603.004.501.50Iris-versicolorIris-versicolor1
665.702.505.002.00Iris-virginicaIris-virginica0.01
675.702.603.501.00Iris-versicolorIris-versicolor1
685.702.804.101.30Iris-versicolorIris-versicolor1
695.702.804.501.30Iris-versicolorIris-versicolor1
705.702.904.201.30Iris-versicolorIris-versicolor1
715.703.004.201.20Iris-versicolorIris-versicolor1
725.703.801.700.30Iris-setosaIris-setosa0
735.704.401.500.40Iris-setosaIris-setosa0
745.802.604.001.20Iris-versicolorIris-versicolor1
755.802.703.901.20Iris-versicolorIris-versicolor1
765.802.704.101.00Iris-versicolorIris-versicolor1
775.802.705.101.90Iris-virginicaIris-virginica0.05
785.802.705.101.90Iris-virginicaIris-virginica0.05
795.802.805.102.40Iris-virginicaIris-virginica0.05
805.804.001.200.20Iris-setosaIris-setosa0
815.903.004.201.50Iris-versicolorIris-versicolor1
825.903.005.101.80Iris-virginicaIris-virginica0.0714286
835.903.204.801.80Iris-versicolorIris-versicolor0.6
846.002.204.001.00Iris-versicolorIris-versicolor0.95
856.002.205.001.50Iris-virginicaIris-versicolor0.493333
866.002.705.101.60Iris-versicolorIris-virginica0.364762
876.002.904.501.50Iris-versicolorIris-versicolor1
886.003.004.801.80Iris-virginicaIris-virginica0.3
896.003.404.501.60Iris-versicolorIris-versicolor1
906.102.605.601.40Iris-virginicaIris-virginica0.11
916.102.804.001.30Iris-versicolorIris-versicolor0.91
926.102.804.701.20Iris-versicolorIris-versicolor0.91
936.102.904.701.40Iris-versicolorIris-versicolor1
946.103.004.601.40Iris-versicolorIris-versicolor1
956.103.004.901.80Iris-virginicaIris-virginica0.131429
966.202.204.501.50Iris-versicolorIris-versicolor0.95
976.202.804.801.80Iris-virginicaIris-virginica0.2
986.202.904.301.30Iris-versicolorIris-versicolor1
996.203.405.402.30Iris-virginicaIris-virginica0
1006.302.304.401.30Iris-versicolorIris-versicolor0.95
Out[13]:
Rows: 1-100 of 150 | Columns: 7

The vDataFrame has also a method 'score' to do model evaluation.

In [14]:
iris.score("Species", "pred_Species", method = "accuracy")
Out[14]:
0.986666666666667

Examine your prediction.

In [15]:
iris.hist(["pred_Species", "Species"])

Some Multiclass Classifiers have the possibility to evaluate the features importance.

In [16]:
model.features_importance()