
Machine Learning - Regression Plots¶
General¶
In this example, we aim to present several regression plots, including linear regression, tree-based algorithms, and various residual plots. It’s important to note that these plots are purely illustrative and are based on generated data. To make the data more realistically representative, we introduce some noise, resulting in an approximately linear relationship.
Let’s begin by importing VerticaPy.
import verticapy as vp
Let’s also import numpy to create a random dataset.
import numpy as np
Let’s generate a dataset using the following data.
N = 100 # Number of Records
x = np.random.normal(5, 1, N) # Normal Distribution
e = np.random.random(N) # Noise
data = vp.vDataFrame({
"x": x,
"y": x + e,
})
Let’s proceed by creating both a linear regression model and a random forest regressor model using the complete dataset. Following that, we can calculate the respective noise associated with each model.
# Importing the Vertica ML module
import verticapy.machine_learning.vertica as vml
# Defining the Models
model_lr = vml.LinearRegression()
model_rf = vml.RandomForestRegressor()
# Fitting the models
model_lr.fit(data, "x", "y")
model_rf.fit(data, "x", "y")
# Adding the predictions to the vDataFrame
model_lr.predict(data, "x", name = "x_lr", inplace = True)
model_rf.predict(data, "x", name = "x_rf", inplace = True)
# Computing the respective noises
data["noise_lr"] = data["x"] - data["x_lr"]
data["noise_rf"] = data["x"] - data["x_rf"]
# Displaying the vDataFrame
display(data)
In the context of data visualization, we have the flexibility to harness multiple plotting libraries to craft a wide range of graphical representations. VerticaPy, as a versatile tool, provides support for several graphic libraries, such as Matplotlib, Highcharts, and Plotly. Each of these libraries offers unique features and capabilities, allowing us to choose the most suitable one for our specific data visualization needs.

Note
To select the desired plotting library, we simply need to use the set_option function. VerticaPy offers the flexibility to smoothly transition between different plotting libraries. In instances where a particular graphic is not supported by the chosen library or is not supported within the VerticaPy framework, the tool will automatically generate a warning and then switch to an alternative library where the graphic can be created.
Please click on the tabs to view the various graphics generated by the different plotting libraries.
We can switch to using the plotly module.
vp.set_option("plotting_lib", "plotly")
model_lr.plot()
Residual Plot
data.scatter(["y", "noise_lr"])