verticapy.machine_learning.memmodel.cluster.KMeans#
- class verticapy.machine_learning.memmodel.cluster.KMeans(clusters: list | ndarray, p: int = 2)#
InMemoryModel
implementation ofKMeans
.Parameters#
- clusters: ArrayLike
list
of the model’s cluster centers.- p: int, optional
The
p
corresponding to one of thep
-distances.
Note
memmodel()
are defined entirely by their attributes. For example,clusters
centroids andp
value define a KMeans model.Attributes#
Attributes are identical to the input parameters, followed by an underscore (‘_’).
Examples#
Initalization
Import the required module.
from verticapy.machine_learning.memmodel.cluster import KMeans
A
KMeans
model is defined by its cluster centroids and the p value. In this example, we will use the following:clusters = [[0.5, 0.6], [1, 2], [100, 200]] p = 2
Let’s create a
KMeans
model.model_km = KMeans(clusters, p)
Create a dataset.
data = [[2, 3]]
Making In-Memory Predictions
Use
predict()
method to do predictionsmodel_km.predict(data)[0] Out[6]: 1
Note
KMeans
assigns a cluster id to identify each cluster. In this example, cluster with centroid[0.5, 0.6]
will haveid = 0
, with centroid[1,2]
will haveid = 1
and so on.predict()
method returns the id of the predicted cluster.Use
predict_proba()
method to compute the predicted probabilities for each cluster.model_km.predict_proba(data) Out[7]: array([[0.33177263, 0.66395985, 0.00426752]])
Use
transform()
method to compute the distance from each cluster.model_km.transform(data) Out[8]: array([[ 2.83019434, 1.41421356, 220.02954347]])
Deploy SQL Code
Let’s use the following column names:
cnames = ['col1', 'col2']
Use
predict_sql()
method to get the SQL code needed to deploy the model using its attributes.model_km.predict_sql(cnames) Out[10]: 'CASE WHEN col1 IS NULL OR col2 IS NULL THEN NULL WHEN POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2) <= POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2) AND POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2) <= POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2) THEN 2 WHEN POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2) <= POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2) THEN 1 ELSE 0 END'
Use
predict_proba_sql()
method to get the SQL code needed to deploy the model that computes predicted probabilities.model_km.predict_proba_sql(cnames) Out[11]: ['(CASE WHEN POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2) = 0 THEN 1.0 ELSE 1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) / (1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2))) END)', '(CASE WHEN POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2) = 0 THEN 1.0 ELSE 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) / (1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2))) END)', '(CASE WHEN POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2) = 0 THEN 1.0 ELSE 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2)) / (1 / (POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)) + 1 / (POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2))) END)']
Use
transform_sql()
method to get the SQL code needed to deploy the model that computes distance from each cluster.model_km.transform_sql(cnames) Out[12]: ['POWER(POWER(col1 - 0.5, 2) + POWER(col2 - 0.6, 2), 1 / 2)', 'POWER(POWER(col1 - 1.0, 2) + POWER(col2 - 2.0, 2), 1 / 2)', 'POWER(POWER(col1 - 100.0, 2) + POWER(col2 - 200.0, 2), 1 / 2)']
Hint
This object can be pickled and used in any in-memory environment, just like SKLEARN models.
- __init__(clusters: list | ndarray, p: int = 2) None #
Methods
__init__
(clusters[, p])Returns the model attributes.
predict
(X)Predicts clusters using the input matrix.
Predicts the probability of each input to belong to the model clusters.
Returns the SQL code needed to deploy the model probabilities.
predict_sql
(X)Returns the SQL code needed to deploy the model using its attributes.
set_attributes
(**kwargs)Sets the model attributes.
transform
(X)Transforms and returns the distance to each cluster.
Transforms and returns the SQL distance to each cluster.
Attributes
Must be overridden in child class