KMEANS

Executes the k-means algorithm on an input table or view. The result is a model with a list of cluster centers.

Important: Before using a machine learning function, be aware that all the ongoing transactions might be committed.

Syntax

KMEANS ( 'model_name', 'input_relation', 'input_columns', num_clusters
	 [ USING PARAMETERS [exclude_columns=['col1, col2, ... coln',]
	                    [max_iterations=value,]
	                    [epsilon=value,]
	                    [init_method=method,]
	                    [initial_centers_table=table_name,]
	                    [output_view='output_view',]
	                    [key_columns='key_columns'] ])

Arguments

model_name

The name of the k-means model.

Model names are case insensitive.

input_relation

The table or view that contains the input data for k-means.

input_columns

The columns of input_relation to be used for clustering. The input_columns argument supports the use of wildcard (*) characters in place of column names.

num_clusters
The number of clusters you want to create. This argument represents the k in k-means. Must be an INT and greater than zero.

Parameters

exclude_columns='col1, col2, ... coln'

(Optional) The columns from input_relation that you want to exclude from clustering.

max_iterations=value

(Optional) The maximum number of iterations the algorithm performs. If you set this value to a number lower than the number of iterations needed for convergence, the algorithm may not converge.

Default Value: 10

epsilon=value

(Optional) Determines whether the algorithm has converged. If, after an iteration, no component of any cluster center changes more than the value of epsilon, the algorithm has converged.

Default Value: 1e-4

init_method=method

(Optional) The method used to find the initial cluster centers. You cannot use this parameter if the initial_centers parameter has a value. Providing a value for both init_method and initial_centers_table causes Vertica to return an error.

Valid Values:

  • random
  • kmeanspp —̶̶̶̶ kmeans++ algorithm

Default Value: kmeanspp

initial_centers_table=table_name

(Optional) The table with the initial cluster centers to use. Supply this value if you know the initial centers you want to use and do not want Vertica to find the initial cluster centers for you.

You cannot use this parameter if the init_method parameter has a value. Providing a value for both init_method and initial_centers_table causes Vertica to return an error.

output_view='output_view'

(Optional) The name of the View where you save the assignments of each point to its cluster.

key_columns='key_columns'

(Optional) A comma-separated list of column names from the input_relation which you use to identify each row of the output in the output_view.

Privileges

To use KMEANS, you must either be a superuser or have CREATE privileges for the schema of the output view and SELECT privileges for the input table or view. There are no privileges needed on the function itself.

See GRANT (Schema) and GRANT (Table).

Examples

The following example shows how you can use the KMEANS function and view the results of the model in the output_view.

=> SELECT KMEANS('myKmeansModel', 'iris1', '*', 5
                  USING PARAMETERS max_iterations=20, output_view='myKmeansView', key_columns='id',
                  exclude_columns='Species, id');
           KMEANS
----------------------------
 Finished in 12 iterations

(1 row)	

See Also