VerticaPy

Python API for Vertica Data Science at Scale

Classification

Classifications are ML algorithms used to predict categorical response columns. For predicting more than two categories, these are called 'Multiclass Classifications'. Examples of classification are predicting the flower species using specific characteristics or predicting whether Telco customers will churn.

To understand how to create a classification model, let's predict the species of flowers with the Iris dataset.

We'll start by importing the Random Forest Classifier.

In [1]:
from verticapy.learn.ensemble import RandomForestClassifier

Next, we'll create a model object. Vertica has its own model management system, so we just need to choose a model name. This name must include the model's schema, which is 'public' by default.

In [2]:
model = RandomForestClassifier("RF_Iris")

Let's use the iris dataset.

In [5]:
from verticapy.datasets import load_iris
iris = load_iris()

Now that the data is loaded, we can fit the model.

In [6]:
model.fit(iris, ["PetalLengthCm", "SepalLengthCm"], "Species")
Out[6]:

===========
call_string
===========
SELECT rf_classifier('public.RF_Iris', '"public"."_verticapy_tmp_view_dbadmin_178273_7491533984_"', '"species"', '"PetalLengthCm", "SepalLengthCm"' USING PARAMETERS exclude_columns='', ntree=10, mtry=1, sampling_size=0.632, max_depth=5, max_breadth=1000000000, min_leaf_size=1, min_info_gain=0, nbins=32);

=======
details
=======
  predictor  |      type      
-------------+----------------
petallengthcm|float or numeric
sepallengthcm|float or numeric


===============
Additional Info
===============
       Name       |Value
------------------+-----
    tree_count    | 10  
rejected_row_count|  0  
accepted_row_count| 150 

We have many metrics to evaluate the model.

In [7]:
model.report()
Out[7]:
Iris-setosa
Iris-versicolor
Iris-virginica
auc1.00.99600000000000020.9954000000000001
prc_auc1.00.99202633839390870.9918601369253075
accuracy1.00.980.9733333333333334
log_loss0.01434337980566460.04762126678181530.0466894193311429
precision1.00.96078431372549020.96
recall1.00.980.96
f1_score1.00.97029702970297020.96
mcc1.00.95533029444245140.94
informedness1.00.960.94
markedness1.00.95068330362448020.94
csi1.00.94230769230769230.9230769230769231
cutoff0.80.44020.4927
Rows: 1-12 | Columns: 4

You can add the predictions to your dataset.

In [8]:
model.predict(iris, name = "prediction")
Out[8]:
123
Id
Int
123
PetalLengthCm
Numeric(6,3)
123
PetalWidthCm
Numeric(6,3)
123
SepalLengthCm
Numeric(6,3)
123
SepalWidthCm
Numeric(6,3)
Abc
Species
Varchar(30)
Abc
prediction
Varchar(128)
111.40.25.13.5Iris-setosaIris-setosa
221.40.24.93.0Iris-setosaIris-setosa
331.30.24.73.2Iris-setosaIris-setosa
441.50.24.63.1Iris-setosaIris-setosa
551.40.25.03.6Iris-setosaIris-setosa
661.70.45.43.9Iris-setosaIris-setosa
771.40.34.63.4Iris-setosaIris-setosa
881.50.25.03.4Iris-setosaIris-setosa
991.40.24.42.9Iris-setosaIris-setosa
10101.50.14.93.1Iris-setosaIris-setosa
11111.50.25.43.7Iris-setosaIris-setosa
12121.60.24.83.4Iris-setosaIris-setosa
13131.40.14.83.0Iris-setosaIris-setosa
14141.10.14.33.0Iris-setosaIris-setosa
15151.20.25.84.0Iris-setosaIris-setosa
16161.50.45.74.4Iris-setosaIris-setosa
17171.30.45.43.9Iris-setosaIris-setosa
18181.40.35.13.5Iris-setosaIris-setosa
19191.70.35.73.8Iris-setosaIris-setosa
20201.50.35.13.8Iris-setosaIris-setosa
21211.70.25.43.4Iris-setosaIris-setosa
22221.50.45.13.7Iris-setosaIris-setosa
23231.00.24.63.6Iris-setosaIris-setosa
24241.70.55.13.3Iris-setosaIris-setosa
25251.90.24.83.4Iris-setosaIris-setosa
26261.60.25.03.0Iris-setosaIris-setosa
27271.60.45.03.4Iris-setosaIris-setosa
28281.50.25.23.5Iris-setosaIris-setosa
29291.40.25.23.4Iris-setosaIris-setosa
30301.60.24.73.2Iris-setosaIris-setosa
31311.60.24.83.1Iris-setosaIris-setosa
32321.50.45.43.4Iris-setosaIris-setosa
33331.50.15.24.1Iris-setosaIris-setosa
34341.40.25.54.2Iris-setosaIris-setosa
35351.50.14.93.1Iris-setosaIris-setosa
36361.20.25.03.2Iris-setosaIris-setosa
37371.30.25.53.5Iris-setosaIris-setosa
38381.50.14.93.1Iris-setosaIris-setosa
39391.30.24.43.0Iris-setosaIris-setosa
40401.50.25.13.4Iris-setosaIris-setosa
41411.30.35.03.5Iris-setosaIris-setosa
42421.30.34.52.3Iris-setosaIris-setosa
43431.30.24.43.2Iris-setosaIris-setosa
44441.60.65.03.5Iris-setosaIris-setosa
45451.90.45.13.8Iris-setosaIris-setosa
46461.40.34.83.0Iris-setosaIris-setosa
47471.60.25.13.8Iris-setosaIris-setosa
48481.40.24.63.2Iris-setosaIris-setosa
49491.50.25.33.7Iris-setosaIris-setosa
50501.40.25.03.3Iris-setosaIris-setosa
51514.71.47.03.2Iris-versicolorIris-versicolor
52524.51.56.43.2Iris-versicolorIris-versicolor
53534.91.56.93.1Iris-versicolorIris-versicolor
54544.01.35.52.3Iris-versicolorIris-versicolor
55554.61.56.52.8Iris-versicolorIris-versicolor
56564.51.35.72.8Iris-versicolorIris-versicolor
57574.71.66.33.3Iris-versicolorIris-versicolor
58583.31.04.92.4Iris-versicolorIris-versicolor
59594.61.36.62.9Iris-versicolorIris-versicolor
60603.91.45.22.7Iris-versicolorIris-versicolor
61613.51.05.02.0Iris-versicolorIris-versicolor
62624.21.55.93.0Iris-versicolorIris-versicolor
63634.01.06.02.2Iris-versicolorIris-versicolor
64644.71.46.12.9Iris-versicolorIris-versicolor
65653.61.35.62.9Iris-versicolorIris-versicolor
66664.41.46.73.1Iris-versicolorIris-versicolor
67674.51.55.63.0Iris-versicolorIris-versicolor
68684.11.05.82.7Iris-versicolorIris-versicolor
69694.51.56.22.2Iris-versicolorIris-versicolor
70703.91.15.62.5Iris-versicolorIris-versicolor
71714.81.85.93.2Iris-versicolorIris-versicolor
72724.01.36.12.8Iris-versicolorIris-versicolor
73734.91.56.32.5Iris-versicolorIris-virginica
74744.71.26.12.8Iris-versicolorIris-versicolor
75754.31.36.42.9Iris-versicolorIris-versicolor
76764.41.46.63.0Iris-versicolorIris-versicolor
77774.81.46.82.8Iris-versicolorIris-versicolor
78785.01.76.73.0Iris-versicolorIris-versicolor
79794.51.56.02.9Iris-versicolorIris-versicolor
80803.51.05.72.6Iris-versicolorIris-versicolor
81813.81.15.52.4Iris-versicolorIris-versicolor
82823.71.05.52.4Iris-versicolorIris-versicolor
83833.91.25.82.7Iris-versicolorIris-versicolor
84845.11.66.02.7Iris-versicolorIris-virginica
85854.51.55.43.0Iris-versicolorIris-versicolor
86864.51.66.03.4Iris-versicolorIris-versicolor
87874.71.56.73.1Iris-versicolorIris-versicolor
88884.41.36.32.3Iris-versicolorIris-versicolor
89894.11.35.63.0Iris-versicolorIris-versicolor
90904.01.35.52.5Iris-versicolorIris-versicolor
91914.41.25.52.6Iris-versicolorIris-versicolor
92924.61.46.13.0Iris-versicolorIris-versicolor
93934.01.25.82.6Iris-versicolorIris-versicolor
94943.31.05.02.3Iris-versicolorIris-versicolor
95954.21.35.62.7Iris-versicolorIris-versicolor
96964.21.25.73.0Iris-versicolorIris-versicolor
97974.21.35.72.9Iris-versicolorIris-versicolor
98984.31.36.22.9Iris-versicolorIris-versicolor
99993.01.15.12.5Iris-versicolorIris-versicolor
1001004.11.35.72.8Iris-versicolorIris-versicolor
Rows: 1-100 | Columns: 7

You can also add the probabilities.

In [9]:
model.predict_proba(iris, name = "prob")
Out[9]:
123
Id
Int
123
PetalLengthCm
Numeric(6,3)
123
PetalWidthCm
Numeric(6,3)
123
SepalLengthCm
Numeric(6,3)
123
SepalWidthCm
Numeric(6,3)
Abc
Species
Varchar(30)
Abc
prediction
Varchar(128)
Abc
prob_irissetosa
Varchar(128)
Abc
prob_irisversicolor
Varchar(128)
Abc
prob_irisvirginica
Varchar(128)
111.40.25.13.5Iris-setosaIris-setosa100
221.40.24.93.0Iris-setosaIris-setosa0.950.0250.025
331.30.24.73.2Iris-setosaIris-setosa100
441.50.24.63.1Iris-setosaIris-setosa100
551.40.25.03.6Iris-setosaIris-setosa100
661.70.45.43.9Iris-setosaIris-setosa100
771.40.34.63.4Iris-setosaIris-setosa100
881.50.25.03.4Iris-setosaIris-setosa100
991.40.24.42.9Iris-setosaIris-setosa100
10101.50.14.93.1Iris-setosaIris-setosa0.950.0250.025
11111.50.25.43.7Iris-setosaIris-setosa100
12121.60.24.83.4Iris-setosaIris-setosa100
13131.40.14.83.0Iris-setosaIris-setosa100
14141.10.14.33.0Iris-setosaIris-setosa100
15151.20.25.84.0Iris-setosaIris-setosa0.9166670.06666670.0166667
16161.50.45.74.4Iris-setosaIris-setosa100
17171.30.45.43.9Iris-setosaIris-setosa100
18181.40.35.13.5Iris-setosaIris-setosa100
19191.70.35.73.8Iris-setosaIris-setosa0.80.20
20201.50.35.13.8Iris-setosaIris-setosa100
21211.70.25.43.4Iris-setosaIris-setosa100
22221.50.45.13.7Iris-setosaIris-setosa100
23231.00.24.63.6Iris-setosaIris-setosa100
24241.70.55.13.3Iris-setosaIris-setosa100
25251.90.24.83.4Iris-setosaIris-setosa0.9250.050.025
26261.60.25.03.0Iris-setosaIris-setosa100
27271.60.45.03.4Iris-setosaIris-setosa100
28281.50.25.23.5Iris-setosaIris-setosa100
29291.40.25.23.4Iris-setosaIris-setosa100
30301.60.24.73.2Iris-setosaIris-setosa100
31311.60.24.83.1Iris-setosaIris-setosa100
32321.50.45.43.4Iris-setosaIris-setosa100
33331.50.15.24.1Iris-setosaIris-setosa100
34341.40.25.54.2Iris-setosaIris-setosa100
35351.50.14.93.1Iris-setosaIris-setosa0.950.0250.025
36361.20.25.03.2Iris-setosaIris-setosa100
37371.30.25.53.5Iris-setosaIris-setosa100
38381.50.14.93.1Iris-setosaIris-setosa0.950.0250.025
39391.30.24.43.0Iris-setosaIris-setosa100
40401.50.25.13.4Iris-setosaIris-setosa100
41411.30.35.03.5Iris-setosaIris-setosa100
42421.30.34.52.3Iris-setosaIris-setosa100
43431.30.24.43.2Iris-setosaIris-setosa100
44441.60.65.03.5Iris-setosaIris-setosa100
45451.90.45.13.8Iris-setosaIris-setosa0.90.10
46461.40.34.83.0Iris-setosaIris-setosa100
47471.60.25.13.8Iris-setosaIris-setosa100
48481.40.24.63.2Iris-setosaIris-setosa100
49491.50.25.33.7Iris-setosaIris-setosa100
50501.40.25.03.3Iris-setosaIris-setosa100
51514.71.47.03.2Iris-versicolorIris-versicolor0.006250.78750.20625
52524.51.56.43.2Iris-versicolorIris-versicolor0.03098290.8309830.138034
53534.91.56.93.1Iris-versicolorIris-versicolor0.08373020.633730.28254
54544.01.35.52.3Iris-versicolorIris-versicolor010
55554.61.56.52.8Iris-versicolorIris-versicolor0.03098290.8309830.138034
56564.51.35.72.8Iris-versicolorIris-versicolor010
57574.71.66.33.3Iris-versicolorIris-versicolor0.04635990.727610.22603
58583.31.04.92.4Iris-versicolorIris-versicolor0.150.5250.325
59594.61.36.62.9Iris-versicolorIris-versicolor0.01944440.9194440.0611111
60603.91.45.22.7Iris-versicolorIris-versicolor0.10.90
61613.51.05.02.0Iris-versicolorIris-versicolor0.10.90
62624.21.55.93.0Iris-versicolorIris-versicolor0.03611110.9277780.0361111
63634.01.06.02.2Iris-versicolorIris-versicolor0.05611110.8877780.0561111
64644.71.46.12.9Iris-versicolorIris-versicolor0.09056620.7034830.205951
65653.61.35.62.9Iris-versicolorIris-versicolor010
66664.41.46.73.1Iris-versicolorIris-versicolor0.03944440.8394440.121111
67674.51.55.63.0Iris-versicolorIris-versicolor010
68684.11.05.82.7Iris-versicolorIris-versicolor0.06666670.8666670.0666667
69694.51.56.22.2Iris-versicolorIris-versicolor0.04264960.8493160.108034
70703.91.15.62.5Iris-versicolorIris-versicolor010
71714.81.85.93.2Iris-versicolorIris-versicolor0.08402780.7819440.134028
72724.01.36.12.8Iris-versicolorIris-versicolor0.04264960.8493160.108034
73734.91.56.32.5Iris-versicolorIris-virginica0.1127290.2627290.624542
74744.71.26.12.8Iris-versicolorIris-versicolor0.09056620.7034830.205951
75754.31.36.42.9Iris-versicolorIris-versicolor0.03098290.8309830.138034
76764.41.46.63.0Iris-versicolorIris-versicolor0.01944440.9194440.0611111
77774.81.46.82.8Iris-versicolorIris-versicolor0.04569440.8269440.127361
78785.01.76.73.0Iris-versicolorIris-versicolor0.103730.553730.34254
79794.51.56.02.9Iris-versicolorIris-versicolor0.05611110.8877780.0561111
80803.51.05.72.6Iris-versicolorIris-versicolor010
81813.81.15.52.4Iris-versicolorIris-versicolor010
82823.71.05.52.4Iris-versicolorIris-versicolor010
83833.91.25.82.7Iris-versicolorIris-versicolor0.06666670.8666670.0666667
84845.11.66.02.7Iris-versicolorIris-virginica0.04861110.4402780.511111
85854.51.55.43.0Iris-versicolorIris-versicolor0.10.80.1
86864.51.66.03.4Iris-versicolorIris-versicolor0.05611110.8877780.0561111
87874.71.56.73.1Iris-versicolorIris-versicolor0.04569440.8269440.127361
88884.41.36.32.3Iris-versicolorIris-versicolor0.04010990.740110.21978
89894.11.35.63.0Iris-versicolorIris-versicolor010
90904.01.35.52.5Iris-versicolorIris-versicolor010
91914.41.25.52.6Iris-versicolorIris-versicolor010
92924.61.46.13.0Iris-versicolorIris-versicolor0.04264960.8493160.108034
93934.01.25.82.6Iris-versicolorIris-versicolor0.06666670.8666670.0666667
94943.31.05.02.3Iris-versicolorIris-versicolor0.10.90
95954.21.35.62.7Iris-versicolorIris-versicolor010
96964.21.25.73.0Iris-versicolorIris-versicolor010
97974.21.35.72.9Iris-versicolorIris-versicolor010
98984.31.36.22.9Iris-versicolorIris-versicolor0.04264960.8493160.108034
99993.01.15.12.5Iris-versicolorIris-versicolor0.10.90
1001004.11.35.72.8Iris-versicolorIris-versicolor010
Rows: 1-100 | Columns: 10

Our example forgoes splitting the data into training and testing, which is important for real-world work. Our main goal in this lesson is to look at the metrics used to evaluate classifications. The most famous metric is accuracy: generally speaking, the closer accuracy is to 1, the better the model is. However, taking metrics at face value can lead to incorrect interpretations.

For example, let's say our goal is to identify bank fraud. Fraudulent activity is relatively rare, so let's say that they represent less than 1% of the data. If we were to predict that there are no frauds in the dataset, we'd end up with an accuracy of 99%. This is why ROC AUC and PRC AUC are more robust metrics.

That said, a good model is simply a model that might solve a the given problem. In that regard, any model is better than a random one.