Clustering#
Clustering algorithms are used to segment data or to find anomalies. Generally speaking, clustering algorithms are sensitive to unnormalized data, so it’s important to properly prepare your data beforehand.
For example, if we consider the ‘titanic’ dataset, the features ‘fare’ and ‘age’ don’t have values within the same interval; that is, ‘fare’ can be much higher than the ‘age’. Applying a clustering algorithm to this kind of dataset would create misleading clusters.
To create a clustering model, we’ll start by importing the k-means algorithm.
[1]:
from verticapy.learn.cluster import KMeans
import verticapy as vp
vp.set_option("plotting_lib","highcharts") # Set the desired plotting library
Next, we’ll create a model object. Since Vertica has its own model management system, we just need to choose a model name and cluster size. The model’s name must include the schema. By default, the public schema is used.
[2]:
vp.drop("KMeans_sm") # To ensure no other model with same name
model = KMeans("KMeans_sm", n_cluster = 3)
Let’s use the iris dataset to fit our model.
[3]:
from verticapy.datasets import load_iris
iris = load_iris()
We can fit the model.
[4]:
model.fit(iris, ["PetalLengthCm", "SepalLengthCm"])
model.plot()
[4]:
While there aren’t any real metrics for evaluating unsupervised models, metrics used during computation can help us to understand the quality of the model. For example, a k-means model with fewer clusters and when the k-means score, ‘Between-Cluster SS / Total SS’ is close to 1.
[5]:
model.get_vertica_attributes("metrics")
[5]:
Abc Varchar(65000) | |
1 |
You can add the prediction to your vDataFrame.
[6]:
model.predict(iris, name = "cluster")
[6]:
123 IdInteger | 123 PetalLengthCmNumeric(8) | 123 PetalWidthCmNumeric(8) | 123 SepalLengthCmNumeric(8) | 123 SepalWidthCmNumeric(8) | Abc SpeciesVarchar(30) | 123 clusterInteger | |
1 | 1 | 1.4 | 0.2 | 5.1 | 3.5 | Iris-setosa | 2 |
2 | 2 | 1.4 | 0.2 | 4.9 | 3.0 | Iris-setosa | 2 |
3 | 3 | 1.3 | 0.2 | 4.7 | 3.2 | Iris-setosa | 2 |
4 | 4 | 1.5 | 0.2 | 4.6 | 3.1 | Iris-setosa | 2 |
5 | 5 | 1.4 | 0.2 | 5.0 | 3.6 | Iris-setosa | 2 |
6 | 6 | 1.7 | 0.4 | 5.4 | 3.9 | Iris-setosa | 2 |
7 | 7 | 1.4 | 0.3 | 4.6 | 3.4 | Iris-setosa | 2 |
8 | 8 | 1.5 | 0.2 | 5.0 | 3.4 | Iris-setosa | 2 |
9 | 9 | 1.4 | 0.2 | 4.4 | 2.9 | Iris-setosa | 2 |
10 | 10 | 1.5 | 0.1 | 4.9 | 3.1 | Iris-setosa | 2 |
11 | 11 | 1.5 | 0.2 | 5.4 | 3.7 | Iris-setosa | 2 |
12 | 12 | 1.6 | 0.2 | 4.8 | 3.4 | Iris-setosa | 2 |
13 | 13 | 1.4 | 0.1 | 4.8 | 3.0 | Iris-setosa | 2 |
14 | 14 | 1.1 | 0.1 | 4.3 | 3.0 | Iris-setosa | 2 |
15 | 15 | 1.2 | 0.2 | 5.8 | 4.0 | Iris-setosa | 2 |
16 | 16 | 1.5 | 0.4 | 5.7 | 4.4 | Iris-setosa | 2 |
17 | 17 | 1.3 | 0.4 | 5.4 | 3.9 | Iris-setosa | 2 |
18 | 18 | 1.4 | 0.3 | 5.1 | 3.5 | Iris-setosa | 2 |
19 | 19 | 1.7 | 0.3 | 5.7 | 3.8 | Iris-setosa | 2 |
20 | 20 | 1.5 | 0.3 | 5.1 | 3.8 | Iris-setosa | 2 |
21 | 21 | 1.7 | 0.2 | 5.4 | 3.4 | Iris-setosa | 2 |
22 | 22 | 1.5 | 0.4 | 5.1 | 3.7 | Iris-setosa | 2 |
23 | 23 | 1.0 | 0.2 | 4.6 | 3.6 | Iris-setosa | 2 |
24 | 24 | 1.7 | 0.5 | 5.1 | 3.3 | Iris-setosa | 2 |
25 | 25 | 1.9 | 0.2 | 4.8 | 3.4 | Iris-setosa | 2 |
26 | 26 | 1.6 | 0.2 | 5.0 | 3.0 | Iris-setosa | 2 |
27 | 27 | 1.6 | 0.4 | 5.0 | 3.4 | Iris-setosa | 2 |
28 | 28 | 1.5 | 0.2 | 5.2 | 3.5 | Iris-setosa | 2 |
29 | 29 | 1.4 | 0.2 | 5.2 | 3.4 | Iris-setosa | 2 |
30 | 30 | 1.6 | 0.2 | 4.7 | 3.2 | Iris-setosa | 2 |
31 | 31 | 1.6 | 0.2 | 4.8 | 3.1 | Iris-setosa | 2 |
32 | 32 | 1.5 | 0.4 | 5.4 | 3.4 | Iris-setosa | 2 |
33 | 33 | 1.5 | 0.1 | 5.2 | 4.1 | Iris-setosa | 2 |
34 | 34 | 1.4 | 0.2 | 5.5 | 4.2 | Iris-setosa | 2 |
35 | 35 | 1.5 | 0.1 | 4.9 | 3.1 | Iris-setosa | 2 |
36 | 36 | 1.2 | 0.2 | 5.0 | 3.2 | Iris-setosa | 2 |
37 | 37 | 1.3 | 0.2 | 5.5 | 3.5 | Iris-setosa | 2 |
38 | 38 | 1.5 | 0.1 | 4.9 | 3.1 | Iris-setosa | 2 |
39 | 39 | 1.3 | 0.2 | 4.4 | 3.0 | Iris-setosa | 2 |
40 | 40 | 1.5 | 0.2 | 5.1 | 3.4 | Iris-setosa | 2 |
41 | 41 | 1.3 | 0.3 | 5.0 | 3.5 | Iris-setosa | 2 |
42 | 42 | 1.3 | 0.3 | 4.5 | 2.3 | Iris-setosa | 2 |
43 | 43 | 1.3 | 0.2 | 4.4 | 3.2 | Iris-setosa | 2 |
44 | 44 | 1.6 | 0.6 | 5.0 | 3.5 | Iris-setosa | 2 |
45 | 45 | 1.9 | 0.4 | 5.1 | 3.8 | Iris-setosa | 2 |
46 | 46 | 1.4 | 0.3 | 4.8 | 3.0 | Iris-setosa | 2 |
47 | 47 | 1.6 | 0.2 | 5.1 | 3.8 | Iris-setosa | 2 |
48 | 48 | 1.4 | 0.2 | 4.6 | 3.2 | Iris-setosa | 2 |
49 | 49 | 1.5 | 0.2 | 5.3 | 3.7 | Iris-setosa | 2 |
50 | 50 | 1.4 | 0.2 | 5.0 | 3.3 | Iris-setosa | 2 |
51 | 51 | 4.7 | 1.4 | 7.0 | 3.2 | Iris-versicolor | 0 |
52 | 52 | 4.5 | 1.5 | 6.4 | 3.2 | Iris-versicolor | 1 |
53 | 53 | 4.9 | 1.5 | 6.9 | 3.1 | Iris-versicolor | 0 |
54 | 54 | 4.0 | 1.3 | 5.5 | 2.3 | Iris-versicolor | 1 |
55 | 55 | 4.6 | 1.5 | 6.5 | 2.8 | Iris-versicolor | 1 |
56 | 56 | 4.5 | 1.3 | 5.7 | 2.8 | Iris-versicolor | 1 |
57 | 57 | 4.7 | 1.6 | 6.3 | 3.3 | Iris-versicolor | 1 |
58 | 58 | 3.3 | 1.0 | 4.9 | 2.4 | Iris-versicolor | 1 |
59 | 59 | 4.6 | 1.3 | 6.6 | 2.9 | Iris-versicolor | 1 |
60 | 60 | 3.9 | 1.4 | 5.2 | 2.7 | Iris-versicolor | 1 |
61 | 61 | 3.5 | 1.0 | 5.0 | 2.0 | Iris-versicolor | 1 |
62 | 62 | 4.2 | 1.5 | 5.9 | 3.0 | Iris-versicolor | 1 |
63 | 63 | 4.0 | 1.0 | 6.0 | 2.2 | Iris-versicolor | 1 |
64 | 64 | 4.7 | 1.4 | 6.1 | 2.9 | Iris-versicolor | 1 |
65 | 65 | 3.6 | 1.3 | 5.6 | 2.9 | Iris-versicolor | 1 |
66 | 66 | 4.4 | 1.4 | 6.7 | 3.1 | Iris-versicolor | 1 |
67 | 67 | 4.5 | 1.5 | 5.6 | 3.0 | Iris-versicolor | 1 |
68 | 68 | 4.1 | 1.0 | 5.8 | 2.7 | Iris-versicolor | 1 |
69 | 69 | 4.5 | 1.5 | 6.2 | 2.2 | Iris-versicolor | 1 |
70 | 70 | 3.9 | 1.1 | 5.6 | 2.5 | Iris-versicolor | 1 |
71 | 71 | 4.8 | 1.8 | 5.9 | 3.2 | Iris-versicolor | 1 |
72 | 72 | 4.0 | 1.3 | 6.1 | 2.8 | Iris-versicolor | 1 |
73 | 73 | 4.9 | 1.5 | 6.3 | 2.5 | Iris-versicolor | 1 |
74 | 74 | 4.7 | 1.2 | 6.1 | 2.8 | Iris-versicolor | 1 |
75 | 75 | 4.3 | 1.3 | 6.4 | 2.9 | Iris-versicolor | 1 |
76 | 76 | 4.4 | 1.4 | 6.6 | 3.0 | Iris-versicolor | 1 |
77 | 77 | 4.8 | 1.4 | 6.8 | 2.8 | Iris-versicolor | 0 |
78 | 78 | 5.0 | 1.7 | 6.7 | 3.0 | Iris-versicolor | 0 |
79 | 79 | 4.5 | 1.5 | 6.0 | 2.9 | Iris-versicolor | 1 |
80 | 80 | 3.5 | 1.0 | 5.7 | 2.6 | Iris-versicolor | 1 |
81 | 81 | 3.8 | 1.1 | 5.5 | 2.4 | Iris-versicolor | 1 |
82 | 82 | 3.7 | 1.0 | 5.5 | 2.4 | Iris-versicolor | 1 |
83 | 83 | 3.9 | 1.2 | 5.8 | 2.7 | Iris-versicolor | 1 |
84 | 84 | 5.1 | 1.6 | 6.0 | 2.7 | Iris-versicolor | 1 |
85 | 85 | 4.5 | 1.5 | 5.4 | 3.0 | Iris-versicolor | 1 |
86 | 86 | 4.5 | 1.6 | 6.0 | 3.4 | Iris-versicolor | 1 |
87 | 87 | 4.7 | 1.5 | 6.7 | 3.1 | Iris-versicolor | 1 |
88 | 88 | 4.4 | 1.3 | 6.3 | 2.3 | Iris-versicolor | 1 |
89 | 89 | 4.1 | 1.3 | 5.6 | 3.0 | Iris-versicolor | 1 |
90 | 90 | 4.0 | 1.3 | 5.5 | 2.5 | Iris-versicolor | 1 |
91 | 91 | 4.4 | 1.2 | 5.5 | 2.6 | Iris-versicolor | 1 |
92 | 92 | 4.6 | 1.4 | 6.1 | 3.0 | Iris-versicolor | 1 |
93 | 93 | 4.0 | 1.2 | 5.8 | 2.6 | Iris-versicolor | 1 |
94 | 94 | 3.3 | 1.0 | 5.0 | 2.3 | Iris-versicolor | 1 |
95 | 95 | 4.2 | 1.3 | 5.6 | 2.7 | Iris-versicolor | 1 |
96 | 96 | 4.2 | 1.2 | 5.7 | 3.0 | Iris-versicolor | 1 |
97 | 97 | 4.2 | 1.3 | 5.7 | 2.9 | Iris-versicolor | 1 |
98 | 98 | 4.3 | 1.3 | 6.2 | 2.9 | Iris-versicolor | 1 |
99 | 99 | 3.0 | 1.1 | 5.1 | 2.5 | Iris-versicolor | 2 |
100 | 100 | 4.1 | 1.3 | 5.7 | 2.8 | Iris-versicolor | 1 |
This concludes this lesson on clustering models in VerticaPy. We’ll look at time series models in the next lesson.