VerticaPy
Example: Methods in a Multiclass Classification Model¶
In this example, we use the 'Iris' dataset to demonstrate the methods available to multiclass classification models.
from verticapy.datasets import load_iris
iris = load_iris()
display(iris)
Our goal is to predict the flower species. Let's create a random forest model.
from verticapy.learn.ensemble import RandomForestClassifier
model = RandomForestClassifier("public.RF_iris")
model.fit("public.iris",
["PetalLengthCm", "SepalWidthCm", "SepalLengthCm", "PetalWidthCm"],
"Species")
Fitting the model creates new model attributes.
model.X
model.y
model.input_relation
model.test_relation
Since we didn't write a test relation when fitting the model, the model will use the training relation as the test relation.
Let's compute the accuracy of our model.
model.score(method = "accuracy")
The 'score' method uses the 'y' attribute and the model prediction in the 'test_relation' to compute the accuracy of the model. You can change these attributes at any time to deploy the models on different columns.
Models have many useful attributes. For a random forest model, we can use the 'classes_' attribute for a list of the response categories.
model.classes_
You can view other attributes using the 'get_attr' method.
model.get_attr()
Visualizing the model with a PRC, ROC, or lift chart might be helpful. We should keep in mind that this is a multiclass classification, so one class must be considered positive. We can identify this class with the 'pos_label'. For this example, we'll mark the "Iris-setosa' as the positive class for each graph.
model.roc_curve(pos_label = 'Iris-setosa')
model.prc_curve(pos_label = 'Iris-setosa')
model.lift_chart(pos_label = 'Iris-setosa')
Let's look at the SQL query for our model.
display(model.deploySQL())
You can evaluate the quality of your model with the 'report' method. Again, since this is a multiclass classification, each class will have its turn as the 'positive' class during each round.
If we don't specify a cutoff, VerticaPy automatically computes an optimal cutoff.
model.report()
You can add this prediction to your vDataFrame.
model.predict(iris, name = "pred_Species")
You can also add the probability of a specific class with the 'pos_label' parameter.
model.predict(iris, name = "prob_versicolor", pos_label = "Iris-versicolor")
The vDataFrame has its own 'score' method to evaluate your models.
iris.score("Species", "pred_Species", method = "accuracy")
Let's examine our prediction with a histogram.
iris.hist(["pred_Species", "Species"])
Some multiclass classifiers let you evaluate the importance of each feature with the 'features_importance' method.
model.features_importance()
Let's draw a tree diagram.
model.plot_tree()