verticapy.machine_learning.memmodel.cluster.NearestCentroid#

class verticapy.machine_learning.memmodel.cluster.NearestCentroid(clusters: list | ndarray, classes: list | ndarray, p: int = 2)#

InMemoryModel implementation of NearestCentroid algorithm.

Parameters#

clusters: ArrayLike: list of the model’s cluster centers.
classes: ArrayLike: Names of the classes.
p: int, optional: The p corresponding to one of the p-distances.

Attributes#

Attributes are identical to the input parameters, followed by an underscore (‘_’).

Examples#

Initalization

Import the required module.

from verticapy.machine_learning.memmodel.cluster import NearestCentroid

A NearestCentroid model is defined by its clusters centroids, classes and the p value. In this example, we will use the following:

clusters = [[0.5, 0.6], [1, 2], [100, 200]]

p = 2

classes = ['class_a', 'class_b', 'class_c']

Let’s create a NearestCentroid model.

model_nc = NearestCentroid(clusters, classes, p)

Create a dataset.

data = [[2, 3]]

Making In-Memory Predictions

Use predict() method to do predictions.

model_nc.predict(data)[0]
Out[7]: 'class_b'

Use predict_proba() method to compute the predicted probabilities for each cluster.

model_nc.predict_proba(data)
Out[8]: array([[0.33177263, 0.66395985, 0.00426752]])

Use transform() method to compute the distance from each cluster.

model_nc.transform(data)
Out[9]: array([[  2.83019434,   1.41421356, 220.02954347]])

Deploy SQL Code

Let’s use the following column names:

cnames = ['col1', 'col2']

Use predict_sql() method to get the SQL code needed to deploy the model using its attributes.

model_nc.predict_sql(cnames)
Out[11]: "CASE WHEN col1 IS NULL OR col2 IS NULL THEN NULL WHEN POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2) <= POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2) AND POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2) <= POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2) THEN 'class_c' WHEN POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2) <= POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2) THEN 'class_b' ELSE 'class_a' END"

Use predict_proba_sql() method to get the SQL code needed to deploy the model that computes predicted probabilities.

model_nc.predict_proba_sql(cnames)
Out[12]: 
['(CASE WHEN POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2) = 0 THEN 1.0 ELSE 1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) / (1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2))) END)',
 '(CASE WHEN POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2) = 0 THEN 1.0 ELSE 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) / (1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2))) END)',
 '(CASE WHEN POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2) = 0 THEN 1.0 ELSE 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2)) / (1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2))) END)']

Use transform_sql() method to get the SQL code needed to deploy the model that computes distance from each cluster.

model_nc.transform_sql(cnames)
Out[13]: 
['POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)',
 'POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)',
 'POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2)']

Hint

This object can be pickled and used in any in-memory environment, just like SKLEARN models.

__init__(clusters: list | ndarray, classes: list | ndarray, p: int = 2) → None#

Methods

`__init__`(clusters, classes[, p])
`get_attributes`()	Returns the model attributes.
`predict`(X)	Predicts clusters using the input matrix.
`predict_proba`(X)	Predicts the probability of each input to belong to the model clusters.
`predict_proba_sql`(X)	Returns the SQL code needed to deploy the model probabilities.
`predict_sql`(X)	Returns the SQL code needed to deploy the model using its attributes.
`set_attributes`(**kwargs)	Sets the model attributes.
`transform`(X)	Transforms and returns the distance to each cluster.
`transform_sql`(X)	Transforms and returns the SQL distance to each cluster.

Attributes

object_type

Must be overridden in child class