verticapy.machine_learning.memmodel.cluster.NearestCentroid#
- class verticapy.machine_learning.memmodel.cluster.NearestCentroid(clusters: list | ndarray, classes: list | ndarray, p: int = 2)#
InMemoryModel
implementation ofNearestCentroid
algorithm.Parameters#
- clusters: ArrayLike
list
of the model’s cluster centers.- classes: ArrayLike
Names of the classes.
- p: int, optional
The
p
corresponding to one of thep
-distances.
Attributes#
Attributes are identical to the input parameters, followed by an underscore (‘_’).
Examples#
Initalization
Import the required module.
from verticapy.machine_learning.memmodel.cluster import NearestCentroid
A
NearestCentroid
model is defined by itsclusters
centroids,classes
and thep
value. In this example, we will use the following:clusters = [[0.5, 0.6], [1, 2], [100, 200]] p = 2 classes = ['class_a', 'class_b', 'class_c']
Let’s create a
NearestCentroid
model.model_nc = NearestCentroid(clusters, classes, p)
Create a dataset.
data = [[2, 3]]
Making In-Memory Predictions
Use
predict()
method to do predictions.model_nc.predict(data)[0] Out[7]: 'class_b'
Use
predict_proba()
method to compute the predicted probabilities for each cluster.model_nc.predict_proba(data) Out[8]: array([[0.33177263, 0.66395985, 0.00426752]])
Use
transform()
method to compute the distance from each cluster.model_nc.transform(data) Out[9]: array([[ 2.83019434, 1.41421356, 220.02954347]])
Deploy SQL Code
Let’s use the following column names:
cnames = ['col1', 'col2']
Use
predict_sql()
method to get the SQL code needed to deploy the model using its attributes.model_nc.predict_sql(cnames) Out[11]: "CASE WHEN col1 IS NULL OR col2 IS NULL THEN NULL WHEN POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2) <= POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2) AND POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2) <= POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2) THEN 'class_c' WHEN POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2) <= POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2) THEN 'class_b' ELSE 'class_a' END"
Use
predict_proba_sql()
method to get the SQL code needed to deploy the model that computes predicted probabilities.model_nc.predict_proba_sql(cnames) Out[12]: ['(CASE WHEN POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2) = 0 THEN 1.0 ELSE 1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) / (1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2))) END)', '(CASE WHEN POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2) = 0 THEN 1.0 ELSE 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) / (1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2))) END)', '(CASE WHEN POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2) = 0 THEN 1.0 ELSE 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2)) / (1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2))) END)']
Use
transform_sql()
method to get the SQL code needed to deploy the model that computes distance from each cluster.model_nc.transform_sql(cnames) Out[13]: ['POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)', 'POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)', 'POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2)']
Hint
This object can be pickled and used in any in-memory environment, just like SKLEARN models.
- __init__(clusters: list | ndarray, classes: list | ndarray, p: int = 2) None #
Methods
__init__
(clusters, classes[, p])Returns the model attributes.
predict
(X)Predicts clusters using the input matrix.
Predicts the probability of each input to belong to the model clusters.
Returns the SQL code needed to deploy the model probabilities.
predict_sql
(X)Returns the SQL code needed to deploy the model using its attributes.
set_attributes
(**kwargs)Sets the model attributes.
transform
(X)Transforms and returns the distance to each cluster.
Transforms and returns the SQL distance to each cluster.
Attributes
Must be overridden in child class