Loading...

Machine Learning - Classification Curve

General

In this classification example, our goal is to develop a predictive model that can determine the likelihood of passengers surviving the ill-fated Titanic voyage based on two critical factors: “age” and “fare”. The objective is to demonstrate how to plot ROC curves, Precision-Recall curves, or even gain curves by building a logistic regression model. Furthermore, we will illustrate how to create these curves using just two columns: one for probabilities and one for predictions.

Let’s begin by importing VerticaPy.

import verticapy as vp

Let’s import the titanic dataset from verticapy.datasets.

from verticapy.datasets import load_titanic

data = load_titanic()

Let’s create a logistic regression model using the entire dataset.

# Importing the Vertica ML module
import verticapy.machine_learning.vertica as vml

# Importing the Metrics module
import verticapy.machine_learning.metrics as vmt

# Defining the Model
model = vml.LogisticRegression()

# Defining Predictors and Response.
X = ["age", "fare"]
y = "survived"

# Fitting the model
model.fit(data, X, y)

# Adding the probabilities to the vDataFrame
model.predict_proba(data, X, name = "survived_proba", inplace = True)

# Displaying the vDataFrame
display(data)

In the context of data visualization, we have the flexibility to harness multiple plotting libraries to craft a wide range of graphical representations. VerticaPy, as a versatile tool, provides support for several graphic libraries, such as Matplotlib, Highcharts, and Plotly. Each of these libraries offers unique features and capabilities, allowing us to choose the most suitable one for our specific data visualization needs.

_images/plotting_libs.png

Note

To select the desired plotting library, we simply need to use the set_option function. VerticaPy offers the flexibility to smoothly transition between different plotting libraries. In instances where a particular graphic is not supported by the chosen library or is not supported within the VerticaPy framework, the tool will automatically generate a warning and then switch to an alternative library where the graphic can be created.

Please click on the tabs to view the various graphics generated by the different plotting libraries.

We can switch to using the plotly module.

vp.set_option("plotting_lib", "plotly")

In VerticaPy, you have access to various classification curves that can be generated directly from the model. If you opt for this method, please ensure that you specify a test set for accuracy; otherwise, the curve will be based on the training set. Alternatively, you can create these curves using a probability column and a response column.

model.roc_curve()

Creating visualizations using two columns

vmt.roc_curve(y_true = "survived", y_score = "survived_proba_1", input_relation = data)