VerticaPy

Python API for Vertica Data Science at Scale

Example: Methods in a Multiclass Classification Model

In this example, we use the 'Iris' dataset to demonstrate the methods available to multiclass classification models.

In [1]:
from verticapy.datasets import load_iris
iris = load_iris()
display(iris)
123
SepalLengthCm
Numeric(5,2)
123
SepalWidthCm
Numeric(5,2)
123
PetalLengthCm
Numeric(5,2)
123
PetalWidthCm
Numeric(5,2)
Abc
Species
Varchar(30)
14.303.001.100.10Iris-setosa
24.402.901.400.20Iris-setosa
34.403.001.300.20Iris-setosa
44.403.201.300.20Iris-setosa
54.502.301.300.30Iris-setosa
64.603.101.500.20Iris-setosa
74.603.201.400.20Iris-setosa
84.603.401.400.30Iris-setosa
94.603.601.000.20Iris-setosa
104.703.201.300.20Iris-setosa
114.703.201.600.20Iris-setosa
124.803.001.400.10Iris-setosa
134.803.001.400.30Iris-setosa
144.803.101.600.20Iris-setosa
154.803.401.600.20Iris-setosa
164.803.401.900.20Iris-setosa
174.902.403.301.00Iris-versicolor
184.902.504.501.70Iris-virginica
194.903.001.400.20Iris-setosa
204.903.101.500.10Iris-setosa
214.903.101.500.10Iris-setosa
224.903.101.500.10Iris-setosa
235.002.003.501.00Iris-versicolor
245.002.303.301.00Iris-versicolor
255.003.001.600.20Iris-setosa
265.003.201.200.20Iris-setosa
275.003.301.400.20Iris-setosa
285.003.401.500.20Iris-setosa
295.003.401.600.40Iris-setosa
305.003.501.300.30Iris-setosa
315.003.501.600.60Iris-setosa
325.003.601.400.20Iris-setosa
335.102.503.001.10Iris-versicolor
345.103.301.700.50Iris-setosa
355.103.401.500.20Iris-setosa
365.103.501.400.20Iris-setosa
375.103.501.400.30Iris-setosa
385.103.701.500.40Iris-setosa
395.103.801.500.30Iris-setosa
405.103.801.600.20Iris-setosa
415.103.801.900.40Iris-setosa
425.202.703.901.40Iris-versicolor
435.203.401.400.20Iris-setosa
445.203.501.500.20Iris-setosa
455.204.101.500.10Iris-setosa
465.303.701.500.20Iris-setosa
475.403.004.501.50Iris-versicolor
485.403.401.500.40Iris-setosa
495.403.401.700.20Iris-setosa
505.403.701.500.20Iris-setosa
515.403.901.300.40Iris-setosa
525.403.901.700.40Iris-setosa
535.502.304.001.30Iris-versicolor
545.502.403.701.00Iris-versicolor
555.502.403.801.10Iris-versicolor
565.502.504.001.30Iris-versicolor
575.502.604.401.20Iris-versicolor
585.503.501.300.20Iris-setosa
595.504.201.400.20Iris-setosa
605.602.503.901.10Iris-versicolor
615.602.704.201.30Iris-versicolor
625.602.804.902.00Iris-virginica
635.602.903.601.30Iris-versicolor
645.603.004.101.30Iris-versicolor
655.603.004.501.50Iris-versicolor
665.702.505.002.00Iris-virginica
675.702.603.501.00Iris-versicolor
685.702.804.101.30Iris-versicolor
695.702.804.501.30Iris-versicolor
705.702.904.201.30Iris-versicolor
715.703.004.201.20Iris-versicolor
725.703.801.700.30Iris-setosa
735.704.401.500.40Iris-setosa
745.802.604.001.20Iris-versicolor
755.802.703.901.20Iris-versicolor
765.802.704.101.00Iris-versicolor
775.802.705.101.90Iris-virginica
785.802.705.101.90Iris-virginica
795.802.805.102.40Iris-virginica
805.804.001.200.20Iris-setosa
815.903.004.201.50Iris-versicolor
825.903.005.101.80Iris-virginica
835.903.204.801.80Iris-versicolor
846.002.204.001.00Iris-versicolor
856.002.205.001.50Iris-virginica
866.002.705.101.60Iris-versicolor
876.002.904.501.50Iris-versicolor
886.003.004.801.80Iris-virginica
896.003.404.501.60Iris-versicolor
906.102.605.601.40Iris-virginica
916.102.804.001.30Iris-versicolor
926.102.804.701.20Iris-versicolor
936.102.904.701.40Iris-versicolor
946.103.004.601.40Iris-versicolor
956.103.004.901.80Iris-virginica
966.202.204.501.50Iris-versicolor
976.202.804.801.80Iris-virginica
986.202.904.301.30Iris-versicolor
996.203.405.402.30Iris-virginica
1006.302.304.401.30Iris-versicolor
Rows: 1-100 of 150 | Columns: 5

Our goal is to predict the flower species. Let's create a random forest model.

In [2]:
from verticapy.learn.ensemble import RandomForestClassifier
model = RandomForestClassifier("public.RF_iris")
model.fit("public.iris", 
          ["PetalLengthCm", "SepalWidthCm", "SepalLengthCm", "PetalWidthCm"], 
          "Species")
Out[2]:

===========
call_string
===========
SELECT rf_classifier('public.RF_iris', 'public.iris', '"species"', '"PetalLengthCm", "SepalWidthCm", "SepalLengthCm", "PetalWidthCm"' USING PARAMETERS exclude_columns='', ntree=10, mtry=2, sampling_size=0.632, max_depth=5, max_breadth=1000000000, min_leaf_size=1, min_info_gain=0, nbins=32);

=======
details
=======
  predictor  |      type      
-------------+----------------
petallengthcm|float or numeric
sepalwidthcm |float or numeric
sepallengthcm|float or numeric
petalwidthcm |float or numeric


===============
Additional Info
===============
       Name       |Value
------------------+-----
    tree_count    | 10  
rejected_row_count|  0  
accepted_row_count| 150 

Fitting the model creates new model attributes.

In [3]:
model.X
Out[3]:
['"PetalLengthCm"', '"SepalWidthCm"', '"SepalLengthCm"', '"PetalWidthCm"']
In [4]:
model.y
Out[4]:
'"Species"'
In [5]:
model.input_relation
Out[5]:
'public.iris'
In [6]:
model.test_relation
Out[6]:
'public.iris'

Since we didn't write a test relation when fitting the model, the model will use the training relation as the test relation.

Let's compute the accuracy of our model.

In [7]:
model.score(method = "accuracy")
Out[7]:
0.986666666666667

The 'score' method uses the 'y' attribute and the model prediction in the 'test_relation' to compute the accuracy of the model. You can change these attributes at any time to deploy the models on different columns.

Models have many useful attributes. For a random forest model, we can use the 'classes_' attribute for a list of the response categories.

In [8]:
model.classes_
Out[8]:
['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']

You can view other attributes using the 'get_attr' method.

In [18]:
model.get_attr()
Abc
attr_name
Varchar(128)
Abc
attr_fields
Long varchar(32000000)
123
#_of_rows
Integer
1tree_counttree_count1
2rejected_row_countrejected_row_count1
3accepted_row_countaccepted_row_count1
4call_stringcall_string1
5detailspredictor, type4
Out[18]:

Visualizing the model with a PRC, ROC, or lift chart might be helpful. We should keep in mind that this is a multiclass classification, so one class must be considered positive. We can identify this class with the 'pos_label'. For this example, we'll mark the "Iris-setosa' as the positive class for each graph.

In [9]:
model.roc_curve(pos_label = 'Iris-setosa')
model.prc_curve(pos_label = 'Iris-setosa')
model.lift_chart(pos_label = 'Iris-setosa')

Let's look at the SQL query for our model.

In [10]:
display(model.deploySQL())
PREDICT_RF_CLASSIFIER("PetalLengthCm", "SepalWidthCm", "SepalLengthCm", "PetalWidthCm" USING PARAMETERS model_name = 'public.RF_iris', match_by_pos = 'true')

You can evaluate the quality of your model with the 'classification_report' method. Again, since this is a multiclass classification, each class will have its turn as the 'positive' class during each round.

If we don't specify a cutoff, VerticaPy automatically computes an optimal cutoff.

In [11]:
model.classification_report()
Iris-setosa
Iris-versicolor
Iris-virginica
auc1.00.99980000000000020.9998000000000001
prc_auc1.00.99960392156862740.9996039215686274
accuracy1.00.99333333333333330.9933333333333333
log_loss0.003982889180144740.01924652280070740.0188544198895192
precision1.00.98039215686274510.9803921568627451
recall1.01.01.0
f1_score1.00.99497487437185930.9949748743718593
mcc1.00.9851843661437780.985184366143778
informedness1.00.990.99
markedness1.00.98039215686274520.9803921568627452
csi1.00.98039215686274510.9803921568627451
cutoff0.90.3640.438
Out[11]:

You can add this prediction to your vDataFrame.

In [12]:
model.predict(iris, name = "pred_Species")
123
SepalLengthCm
Numeric(5,2)
123
SepalWidthCm
Numeric(5,2)
123
PetalLengthCm
Numeric(5,2)
123
PetalWidthCm
Numeric(5,2)
Abc
Species
Varchar(30)
Abc
pred_Species
Varchar(128)
14.303.001.100.10Iris-setosaIris-setosa
24.402.901.400.20Iris-setosaIris-setosa
34.403.001.300.20Iris-setosaIris-setosa
44.403.201.300.20Iris-setosaIris-setosa
54.502.301.300.30Iris-setosaIris-setosa
64.603.101.500.20Iris-setosaIris-setosa
74.603.201.400.20Iris-setosaIris-setosa
84.603.401.400.30Iris-setosaIris-setosa
94.603.601.000.20Iris-setosaIris-setosa
104.703.201.300.20Iris-setosaIris-setosa
114.703.201.600.20Iris-setosaIris-setosa
124.803.001.400.10Iris-setosaIris-setosa
134.803.001.400.30Iris-setosaIris-setosa
144.803.101.600.20Iris-setosaIris-setosa
154.803.401.600.20Iris-setosaIris-setosa
164.803.401.900.20Iris-setosaIris-setosa
174.902.403.301.00Iris-versicolorIris-versicolor
184.902.504.501.70Iris-virginicaIris-virginica
194.903.001.400.20Iris-setosaIris-setosa
204.903.101.500.10Iris-setosaIris-setosa
214.903.101.500.10Iris-setosaIris-setosa
224.903.101.500.10Iris-setosaIris-setosa
235.002.003.501.00Iris-versicolorIris-versicolor
245.002.303.301.00Iris-versicolorIris-versicolor
255.003.001.600.20Iris-setosaIris-setosa
265.003.201.200.20Iris-setosaIris-setosa
275.003.301.400.20Iris-setosaIris-setosa
285.003.401.500.20Iris-setosaIris-setosa
295.003.401.600.40Iris-setosaIris-setosa
305.003.501.300.30Iris-setosaIris-setosa
315.003.501.600.60Iris-setosaIris-setosa
325.003.601.400.20Iris-setosaIris-setosa
335.102.503.001.10Iris-versicolorIris-versicolor
345.103.301.700.50Iris-setosaIris-setosa
355.103.401.500.20Iris-setosaIris-setosa
365.103.501.400.20Iris-setosaIris-setosa
375.103.501.400.30Iris-setosaIris-setosa
385.103.701.500.40Iris-setosaIris-setosa
395.103.801.500.30Iris-setosaIris-setosa
405.103.801.600.20Iris-setosaIris-setosa
415.103.801.900.40Iris-setosaIris-setosa
425.202.703.901.40Iris-versicolorIris-versicolor
435.203.401.400.20Iris-setosaIris-setosa
445.203.501.500.20Iris-setosaIris-setosa
455.204.101.500.10Iris-setosaIris-setosa
465.303.701.500.20Iris-setosaIris-setosa
475.403.004.501.50Iris-versicolorIris-versicolor
485.403.401.500.40Iris-setosaIris-setosa
495.403.401.700.20Iris-setosaIris-setosa
505.403.701.500.20Iris-setosaIris-setosa
515.403.901.300.40Iris-setosaIris-setosa
525.403.901.700.40Iris-setosaIris-setosa
535.502.304.001.30Iris-versicolorIris-versicolor
545.502.403.701.00Iris-versicolorIris-versicolor
555.502.403.801.10Iris-versicolorIris-versicolor
565.502.504.001.30Iris-versicolorIris-versicolor
575.502.604.401.20Iris-versicolorIris-versicolor
585.503.501.300.20Iris-setosaIris-setosa
595.504.201.400.20Iris-setosaIris-setosa
605.602.503.901.10Iris-versicolorIris-versicolor
615.602.704.201.30Iris-versicolorIris-versicolor
625.602.804.902.00Iris-virginicaIris-virginica
635.602.903.601.30Iris-versicolorIris-versicolor
645.603.004.101.30Iris-versicolorIris-versicolor
655.603.004.501.50Iris-versicolorIris-versicolor
665.702.505.002.00Iris-virginicaIris-virginica
675.702.603.501.00Iris-versicolorIris-versicolor
685.702.804.101.30Iris-versicolorIris-versicolor
695.702.804.501.30Iris-versicolorIris-versicolor
705.702.904.201.30Iris-versicolorIris-versicolor
715.703.004.201.20Iris-versicolorIris-versicolor
725.703.801.700.30Iris-setosaIris-setosa
735.704.401.500.40Iris-setosaIris-setosa
745.802.604.001.20Iris-versicolorIris-versicolor
755.802.703.901.20Iris-versicolorIris-versicolor
765.802.704.101.00Iris-versicolorIris-versicolor
775.802.705.101.90Iris-virginicaIris-virginica
785.802.705.101.90Iris-virginicaIris-virginica
795.802.805.102.40Iris-virginicaIris-virginica
805.804.001.200.20Iris-setosaIris-setosa
815.903.004.201.50Iris-versicolorIris-versicolor
825.903.005.101.80Iris-virginicaIris-virginica
835.903.204.801.80Iris-versicolorIris-versicolor
846.002.204.001.00Iris-versicolorIris-versicolor
856.002.205.001.50Iris-virginicaIris-versicolor
866.002.705.101.60Iris-versicolorIris-virginica
876.002.904.501.50Iris-versicolorIris-versicolor
886.003.004.801.80Iris-virginicaIris-virginica
896.003.404.501.60Iris-versicolorIris-versicolor
906.102.605.601.40Iris-virginicaIris-virginica
916.102.804.001.30Iris-versicolorIris-versicolor
926.102.804.701.20Iris-versicolorIris-versicolor
936.102.904.701.40Iris-versicolorIris-versicolor
946.103.004.601.40Iris-versicolorIris-versicolor
956.103.004.901.80Iris-virginicaIris-virginica
966.202.204.501.50Iris-versicolorIris-versicolor
976.202.804.801.80Iris-virginicaIris-virginica
986.202.904.301.30Iris-versicolorIris-versicolor
996.203.405.402.30Iris-virginicaIris-virginica
1006.302.304.401.30Iris-versicolorIris-versicolor
Out[12]:
Rows: 1-100 of 150 | Columns: 6

You can also add the probability of a specific class with the 'pos_label' parameter.

In [13]:
model.predict(iris, name = "prob_versicolor", pos_label = "Iris-versicolor")
123
SepalLengthCm
Numeric(5,2)
123
SepalWidthCm
Numeric(5,2)
123
PetalLengthCm
Numeric(5,2)
123
PetalWidthCm
Numeric(5,2)
Abc
Species
Varchar(30)
Abc
pred_Species
Varchar(128)
Abc
prob_versicolor
Varchar(128)
14.303.001.100.10Iris-setosaIris-setosa0
24.402.901.400.20Iris-setosaIris-setosa0
34.403.001.300.20Iris-setosaIris-setosa0
44.403.201.300.20Iris-setosaIris-setosa0
54.502.301.300.30Iris-setosaIris-setosa0
64.603.101.500.20Iris-setosaIris-setosa0
74.603.201.400.20Iris-setosaIris-setosa0
84.603.401.400.30Iris-setosaIris-setosa0
94.603.601.000.20Iris-setosaIris-setosa0
104.703.201.300.20Iris-setosaIris-setosa0
114.703.201.600.20Iris-setosaIris-setosa0
124.803.001.400.10Iris-setosaIris-setosa0
134.803.001.400.30Iris-setosaIris-setosa0
144.803.101.600.20Iris-setosaIris-setosa0
154.803.401.600.20Iris-setosaIris-setosa0
164.803.401.900.20Iris-setosaIris-setosa0
174.902.403.301.00Iris-versicolorIris-versicolor0.9
184.902.504.501.70Iris-virginicaIris-virginica0.3
194.903.001.400.20Iris-setosaIris-setosa0
204.903.101.500.10Iris-setosaIris-setosa0
214.903.101.500.10Iris-setosaIris-setosa0
224.903.101.500.10Iris-setosaIris-setosa0
235.002.003.501.00Iris-versicolorIris-versicolor1
245.002.303.301.00Iris-versicolorIris-versicolor1
255.003.001.600.20Iris-setosaIris-setosa0
265.003.201.200.20Iris-setosaIris-setosa0
275.003.301.400.20Iris-setosaIris-setosa0
285.003.401.500.20Iris-setosaIris-setosa0
295.003.401.600.40Iris-setosaIris-setosa0
305.003.501.300.30Iris-setosaIris-setosa0
315.003.501.600.60Iris-setosaIris-setosa0.1
325.003.601.400.20Iris-setosaIris-setosa0
335.102.503.001.10Iris-versicolorIris-versicolor1
345.103.301.700.50Iris-setosaIris-setosa0.1
355.103.401.500.20Iris-setosaIris-setosa0
365.103.501.400.20Iris-setosaIris-setosa0
375.103.501.400.30Iris-setosaIris-setosa0
385.103.701.500.40Iris-setosaIris-setosa0
395.103.801.500.30Iris-setosaIris-setosa0
405.103.801.600.20Iris-setosaIris-setosa0
415.103.801.900.40Iris-setosaIris-setosa0
425.202.703.901.40Iris-versicolorIris-versicolor1
435.203.401.400.20Iris-setosaIris-setosa0
445.203.501.500.20Iris-setosaIris-setosa0
455.204.101.500.10Iris-setosaIris-setosa0
465.303.701.500.20Iris-setosaIris-setosa0
475.403.004.501.50Iris-versicolorIris-versicolor1
485.403.401.500.40Iris-setosaIris-setosa0
495.403.401.700.20Iris-setosaIris-setosa0
505.403.701.500.20Iris-setosaIris-setosa0
515.403.901.300.40Iris-setosaIris-setosa0
525.403.901.700.40Iris-setosaIris-setosa0
535.502.304.001.30Iris-versicolorIris-versicolor1
545.502.403.701.00Iris-versicolorIris-versicolor1
555.502.403.801.10Iris-versicolorIris-versicolor1
565.502.504.001.30Iris-versicolorIris-versicolor1
575.502.604.401.20Iris-versicolorIris-versicolor1
585.503.501.300.20Iris-setosaIris-setosa0
595.504.201.400.20Iris-setosaIris-setosa0
605.602.503.901.10Iris-versicolorIris-versicolor1
615.602.704.201.30Iris-versicolorIris-versicolor1
625.602.804.902.00Iris-virginicaIris-virginica0.01
635.602.903.601.30Iris-versicolorIris-versicolor1
645.603.004.101.30Iris-versicolorIris-versicolor1
655.603.004.501.50Iris-versicolorIris-versicolor1
665.702.505.002.00Iris-virginicaIris-virginica0.01
675.702.603.501.00Iris-versicolorIris-versicolor1
685.702.804.101.30Iris-versicolorIris-versicolor1
695.702.804.501.30Iris-versicolorIris-versicolor1
705.702.904.201.30Iris-versicolorIris-versicolor1
715.703.004.201.20Iris-versicolorIris-versicolor1
725.703.801.700.30Iris-setosaIris-setosa0
735.704.401.500.40Iris-setosaIris-setosa0
745.802.604.001.20Iris-versicolorIris-versicolor1
755.802.703.901.20Iris-versicolorIris-versicolor1
765.802.704.101.00Iris-versicolorIris-versicolor1
775.802.705.101.90Iris-virginicaIris-virginica0.05
785.802.705.101.90Iris-virginicaIris-virginica0.05
795.802.805.102.40Iris-virginicaIris-virginica0.05
805.804.001.200.20Iris-setosaIris-setosa0
815.903.004.201.50Iris-versicolorIris-versicolor1
825.903.005.101.80Iris-virginicaIris-virginica0.0714286
835.903.204.801.80Iris-versicolorIris-versicolor0.6
846.002.204.001.00Iris-versicolorIris-versicolor0.95
856.002.205.001.50Iris-virginicaIris-versicolor0.493333
866.002.705.101.60Iris-versicolorIris-virginica0.364762
876.002.904.501.50Iris-versicolorIris-versicolor1
886.003.004.801.80Iris-virginicaIris-virginica0.3
896.003.404.501.60Iris-versicolorIris-versicolor1
906.102.605.601.40Iris-virginicaIris-virginica0.11
916.102.804.001.30Iris-versicolorIris-versicolor0.91
926.102.804.701.20Iris-versicolorIris-versicolor0.91
936.102.904.701.40Iris-versicolorIris-versicolor1
946.103.004.601.40Iris-versicolorIris-versicolor1
956.103.004.901.80Iris-virginicaIris-virginica0.131429
966.202.204.501.50Iris-versicolorIris-versicolor0.95
976.202.804.801.80Iris-virginicaIris-virginica0.2
986.202.904.301.30Iris-versicolorIris-versicolor1
996.203.405.402.30Iris-virginicaIris-virginica0
1006.302.304.401.30Iris-versicolorIris-versicolor0.95
Out[13]:
Rows: 1-100 of 150 | Columns: 7

The vDataFrame has its own 'score' method to evaluate your models.

In [14]:
iris.score("Species", "pred_Species", method = "accuracy")
Out[14]:
0.986666666666667

Let's examine our prediction with a histogram.

In [15]:
iris.hist(["pred_Species", "Species"])

Some multiclass classifiers let you evaluate the importance of each feature with the 'features_importance' method.

In [16]:
model.features_importance()