KMEANS
Executes the k-means algorithm on an input table or view. The result is a model with a list of cluster centers.
Important: Before using a machine learning function, be aware that all the ongoing transactions might be committed.
Syntax
KMEANS ( 'model_name', 'input_relation', 'input_columns', num_clusters [ USING PARAMETERS [exclude_columns=['col1, col2, ... coln',] [max_iterations=value,] [epsilon=value,] [init_method=method,] [initial_centers_table=table_name,] [output_view='output_view',] [key_columns='key_columns'] ])
Arguments
model_name |
The name of the k-means model. Model names are case insensitive. |
input_relation |
The table or view that contains the input data for k-means. |
input_columns |
The columns of |
num_clusters |
The number of clusters you want to create. This argument represents the k in k-means. Must be an INT and greater than zero. |
Parameters
exclude_columns='col1, col2, ... coln' |
(Optional) The columns from |
max_iterations=value |
(Optional) The maximum number of iterations the algorithm performs. If you set this value to a number lower than the number of iterations needed for convergence, the algorithm may not converge. Default Value: 10 |
epsilon=value |
(Optional) Determines whether the algorithm has converged. If, after an iteration, no component of any cluster center changes more than the value of epsilon, the algorithm has converged. Default Value: 1e-4 |
init_method=method |
(Optional) The method used to find the initial cluster centers. You cannot use this parameter if the Valid Values:
Default Value: kmeanspp |
initial_centers_table=table_name |
(Optional) The table with the initial cluster centers to use. Supply this value if you know the initial centers you want to use and do not want Vertica to find the initial cluster centers for you. You cannot use this parameter if the |
output_view='output_view' |
(Optional) The name of the View where you save the assignments of each point to its cluster. |
key_columns='key_columns' |
(Optional) A comma-separated list of column names from the |
Privileges
To use KMEANS, you must either be a superuser or have CREATE privileges for the schema of the output view and SELECT privileges for the input table or view. There are no privileges needed on the function itself.
See GRANT (Schema) and GRANT (Table).
Examples
The following example shows how you can use the KMEANS function and view the results of the model in the output_view
.
=> SELECT KMEANS('myKmeansModel', 'iris1', '*', 5 USING PARAMETERS max_iterations=20, output_view='myKmeansView', key_columns='id', exclude_columns='Species, id'); KMEANS ---------------------------- Finished in 12 iterations (1 row)