Loading...

Learning Curve

General

VerticaPy’s Validation Curve feature offers a powerful tool for fine-tuning machine learning models while also assessing overfitting. It enables data analysts and scientists to systematically assess the impact of varying hyperparameters on model performance, providing insights into the ideal parameter settings for optimal model accuracy. By visualizing how changing parameters affect metrics like accuracy, error, or F1 score on both training and test datasets, users can make informed decisions on hyperparameter selection. This dual evaluation helps identify potential overfitting, ensuring that models are not only accurate but also robust and well-generalized for real-world applications. Validation curves simplify the process of hyperparameter optimization while guarding against overfitting, resulting in models finely tuned for the best possible results.

Let’s begin by importing VerticaPy.

import verticapy as vp

Let’s generate a dataset using the following data.

import random

N = 500 # Number of Records
k = 10 # step

# Normal Distributions
x = np.random.normal(5, 1, round(N / 2))
y = np.random.normal(3, 1, round(N / 2))
z = np.random.normal(3, 1, round(N / 2))

# Creating a vDataFrame with two clusters
data = vp.vDataFrame({
    "x": np.concatenate([x, x + k]),
    "y": np.concatenate([y, y + k]),
    "z": np.concatenate([z, z + k]),
    "c": [random.randint(0, 1) for _ in range(N)]
})

Let’s proceed by creating a Random Forest Classifier model using the complete dataset.

# Importing the Vertica ML module
import verticapy.machine_learning.vertica as vml

# Importing the model selection module
import verticapy.machine_learning.model_selection as vms

# Defining the Model
model = vml.RandomForestClassifier()

In the context of data visualization, we have the flexibility to harness multiple plotting libraries to craft a wide range of graphical representations. VerticaPy, as a versatile tool, provides support for several graphic libraries, such as Matplotlib, Highcharts, and Plotly. Each of these libraries offers unique features and capabilities, allowing us to choose the most suitable one for our specific data visualization needs.

_images/plotting_libs.png

Note

To select the desired plotting library, we simply need to use the set_option function. VerticaPy offers the flexibility to smoothly transition between different plotting libraries. In instances where a particular graphic is not supported by the chosen library or is not supported within the VerticaPy framework, the tool will automatically generate a warning and then switch to an alternative library where the graphic can be created.

Please click on the tabs to view the various graphics generated by the different plotting libraries.

We can switch to using the plotly module.

vp.set_option("plotting_lib", "plotly")

Let’s generate the Validation Curve.

fig = vms.validation_curve(
  model,
  param_name = "max_depth",
  param_range = [1, 2, 3],
  input_relation = data,
  X = ["x", "y", "z"],
  y = "c",
  cv = 3,
  metric = "auc",
  show = True,
)
fig

We can switch to using the highcharts module.

vp.set_option("plotting_lib", "highcharts")

Let’s generate the Validation Curve.

fig = vms.validation_curve(
  model,
  param_name = "max_depth",
  param_range = [1, 2, 3],
  input_relation = data,
  X = ["x", "y", "z"],
  y = "c",
  cv = 3,
  metric = "auc",
  show = True,
)
fig
Loading....

We can switch to using the matplotlib module.

vp.set_option("plotting_lib", "matplotlib")

Let’s generate the Validation Curve.

vms.validation_curve(
   model,
   param_name = "max_depth",
   param_range = [1, 2, 3],
   input_relation = data,
   X = ["x", "y", "z"],
   y = "c",
   cv = 3,
   metric = "auc",
   show = True,
 )

Out[2]: <Axes: xlabel='max_depth', ylabel='auc'>
_images/plotting_matplotlib_validation_curve.png

Chart Customization

VerticaPy empowers users with a high degree of flexibility when it comes to tailoring the visual aspects of their plots. This customization extends to essential elements such as color schemes, text labels, and plot sizes, as well as a wide range of other attributes that can be fine-tuned to align with specific design preferences and analytical requirements. Whether you want to make your visualizations more visually appealing or need to convey specific insights with precision, VerticaPy’s customization options enable you to craft graphics that suit your exact needs.

Note

As validation curves are essentially range plots, customization options are identical to those available for range.