KMeans

In [ ]:
KMeans(name: str,
       n_cluster: int = 8,
       init: str = "kmeanspp",
       max_iter: int = 300,
       tol: float = 1e-4)

Creates a k-means object by using the Vertica KMEANS function on the data. K-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

Parameters

Name Type Optional Description
name
str
Name of the model to be stored in the database.
n_cluster
int
Number of clusters
init
str / list
The method to use to find the initial cluster centers.
  • kmeanspp : Uses the KMeans++ method to initialize the centers.
  • random : The initial centers.
It can be also a list with the initial cluster centers to use.
max_iter
int
The maximum number of iterations the algorithm performs.
tol
float
Determines whether the algorithm has converged. The algorithm is considered converged after no center has moved more than a distance of 'tol' from the previous iteration.

Attributes

After the object is created, all parameters become attributes. Additional attributes will be created when fitting the model:

Name Type Description
cluster_centers_
tablesample
Clusters result of the algorithm.
metrics_
tablesample
Different metrics to evaluate the model.
input_relation
str
Training relation.
X
list
List of the predictors.

Methods

Name Description
deploySQL Returns the SQL code needed to deploy the model.
drop Drops the model from the Vertica DB.
fit Trains the model.
get_attr Returns the model attribute.
get_params Returns the model Parameters.
plot Draws the k-means clusters.
plot_voronoi Draws the Voronoi Graph of the model.
predict Predicts using the input relation.
set_params Sets the parameters of the model.
to_memmodel Converts a specified Vertica model to a memModel model.
to_python Returns the Python code needed to deploy the model without using built-in Vertica functions.
to_sql Returns the SQL code needed to deploy the model without using Vertica built-in functions.

Example

In [4]:
from verticapy.learn.cluster import KMeans
model = KMeans(name = "public.kmeans_iris",
               n_cluster = 8,
               init = "kmeanspp",
               max_iter = 300,
               tol = 1e-4)
display(model)
<KMeans>