Clustering Algorithms

Clustering is an important and popular machine learning tool used to find clusters of items in a data set that are similar to one another. The goal of clustering is to create clusters with a high number of objects that are similar. Similar to classification, clustering segments the data. However, in clustering, the categorical groups are not defined. Clustering can be used to find anomalies in data and find natural groups of data. For example, you can use clustering to analyze a region and determine which areas of that region are most likely to be hit by an earthquake.

In Vertica, clustering is computed based on distance. Through this computation, data points are assigned to the cluster with the nearest mean.

Vertica supports one algorithm for clustering: