Introduction to Machine Learning#
One of the last stages of the data science life cycle is the Data Modeling. Machine learning algorithms are a set of statistical techniques that build mathematical models from training data. These algorithms come in two types:
Supervised : these algorithms are used when we want to predict a response column.
Unsupervised : these algorithms are used when we want to detect anomalies or when we want to segment the data. No response column is needed.
Supervised Learning#
Supervised Learning techniques map an input to an output based on some example dataset. This type of learning consists of two main types:
Regression : The Response is numerical (Linear Regression, SVM Regression, RF Regression…)
Classification : The Response is categorical (Gradient Boosting, Naive Bayes, Logistic Regression…)
For example, predicting the total charges of a Telco customer using their tenure would be a type of regression. The following code is drawing a linear regression using the ‘TotalCharges’ as a function of the ‘tenure’ in the Telco Churn Dataset.
[1]:
import verticapy as vp
vp.set_option("plotting_lib","highcharts")
churn = vp.read_csv("data/churn.csv")
from verticapy.learn.linear_model import LinearRegression
vp.drop("LR_churn")
model = LinearRegression("LR_churn")
model.fit(churn, ["tenure"], "TotalCharges")
model.plot()
[1]:
In contrast, when we have to predict a categorical column, we’re dealing with classification.
In the following example, we use a Linear Support Vector Classification (SVC) to predict the species of a flower based on its petal and sepal lengths.
[2]:
from verticapy.datasets import load_iris
iris = load_iris()
iris.one_hot_encode()
/opt/venv/lib/python3.10/site-packages/verticapy/core/vdataframe/_encoding.py:123: Warning: The vDataColumn '"Id"' was ignored because of its high cardinality.
Increase the parameter 'max_cardinality' to solve this issue or use directly the vDataColumn get_dummies method.
warnings.warn(warning_message, Warning)
/opt/venv/lib/python3.10/site-packages/verticapy/core/vdataframe/_encoding.py:123: Warning: The vDataColumn '"PetalLengthCm"' was ignored because of its high cardinality.
Increase the parameter 'max_cardinality' to solve this issue or use directly the vDataColumn get_dummies method.
warnings.warn(warning_message, Warning)
/opt/venv/lib/python3.10/site-packages/verticapy/core/vdataframe/_encoding.py:123: Warning: The vDataColumn '"PetalWidthCm"' was ignored because of its high cardinality.
Increase the parameter 'max_cardinality' to solve this issue or use directly the vDataColumn get_dummies method.
warnings.warn(warning_message, Warning)
/opt/venv/lib/python3.10/site-packages/verticapy/core/vdataframe/_encoding.py:123: Warning: The vDataColumn '"SepalLengthCm"' was ignored because of its high cardinality.
Increase the parameter 'max_cardinality' to solve this issue or use directly the vDataColumn get_dummies method.
warnings.warn(warning_message, Warning)
/opt/venv/lib/python3.10/site-packages/verticapy/core/vdataframe/_encoding.py:123: Warning: The vDataColumn '"SepalWidthCm"' was ignored because of its high cardinality.
Increase the parameter 'max_cardinality' to solve this issue or use directly the vDataColumn get_dummies method.
warnings.warn(warning_message, Warning)
[2]:
123 IdInteger | 123 PetalLengthCmNumeric(8) | 123 PetalWidthCmNumeric(8) | 123 SepalLengthCmNumeric(8) | 123 SepalWidthCmNumeric(8) | Abc SpeciesVarchar(30) | 123 Species_Iris-setosaInteger | 123 Species_Iris-versicolorInteger | |
1 | 1 | 1.4 | 0.2 | 5.1 | 3.5 | Iris-setosa | 1 | 0 |
2 | 2 | 1.4 | 0.2 | 4.9 | 3.0 | Iris-setosa | 1 | 0 |
3 | 3 | 1.3 | 0.2 | 4.7 | 3.2 | Iris-setosa | 1 | 0 |
4 | 4 | 1.5 | 0.2 | 4.6 | 3.1 | Iris-setosa | 1 | 0 |
5 | 5 | 1.4 | 0.2 | 5.0 | 3.6 | Iris-setosa | 1 | 0 |
6 | 6 | 1.7 | 0.4 | 5.4 | 3.9 | Iris-setosa | 1 | 0 |
7 | 7 | 1.4 | 0.3 | 4.6 | 3.4 | Iris-setosa | 1 | 0 |
8 | 8 | 1.5 | 0.2 | 5.0 | 3.4 | Iris-setosa | 1 | 0 |
9 | 9 | 1.4 | 0.2 | 4.4 | 2.9 | Iris-setosa | 1 | 0 |
10 | 10 | 1.5 | 0.1 | 4.9 | 3.1 | Iris-setosa | 1 | 0 |
11 | 11 | 1.5 | 0.2 | 5.4 | 3.7 | Iris-setosa | 1 | 0 |
12 | 12 | 1.6 | 0.2 | 4.8 | 3.4 | Iris-setosa | 1 | 0 |
13 | 13 | 1.4 | 0.1 | 4.8 | 3.0 | Iris-setosa | 1 | 0 |
14 | 14 | 1.1 | 0.1 | 4.3 | 3.0 | Iris-setosa | 1 | 0 |
15 | 15 | 1.2 | 0.2 | 5.8 | 4.0 | Iris-setosa | 1 | 0 |
16 | 16 | 1.5 | 0.4 | 5.7 | 4.4 | Iris-setosa | 1 | 0 |
17 | 17 | 1.3 | 0.4 | 5.4 | 3.9 | Iris-setosa | 1 | 0 |
18 | 18 | 1.4 | 0.3 | 5.1 | 3.5 | Iris-setosa | 1 | 0 |
19 | 19 | 1.7 | 0.3 | 5.7 | 3.8 | Iris-setosa | 1 | 0 |
20 | 20 | 1.5 | 0.3 | 5.1 | 3.8 | Iris-setosa | 1 | 0 |
21 | 21 | 1.7 | 0.2 | 5.4 | 3.4 | Iris-setosa | 1 | 0 |
22 | 22 | 1.5 | 0.4 | 5.1 | 3.7 | Iris-setosa | 1 | 0 |
23 | 23 | 1.0 | 0.2 | 4.6 | 3.6 | Iris-setosa | 1 | 0 |
24 | 24 | 1.7 | 0.5 | 5.1 | 3.3 | Iris-setosa | 1 | 0 |
25 | 25 | 1.9 | 0.2 | 4.8 | 3.4 | Iris-setosa | 1 | 0 |
26 | 26 | 1.6 | 0.2 | 5.0 | 3.0 | Iris-setosa | 1 | 0 |
27 | 27 | 1.6 | 0.4 | 5.0 | 3.4 | Iris-setosa | 1 | 0 |
28 | 28 | 1.5 | 0.2 | 5.2 | 3.5 | Iris-setosa | 1 | 0 |
29 | 29 | 1.4 | 0.2 | 5.2 | 3.4 | Iris-setosa | 1 | 0 |
30 | 30 | 1.6 | 0.2 | 4.7 | 3.2 | Iris-setosa | 1 | 0 |
31 | 31 | 1.6 | 0.2 | 4.8 | 3.1 | Iris-setosa | 1 | 0 |
32 | 32 | 1.5 | 0.4 | 5.4 | 3.4 | Iris-setosa | 1 | 0 |
33 | 33 | 1.5 | 0.1 | 5.2 | 4.1 | Iris-setosa | 1 | 0 |
34 | 34 | 1.4 | 0.2 | 5.5 | 4.2 | Iris-setosa | 1 | 0 |
35 | 35 | 1.5 | 0.1 | 4.9 | 3.1 | Iris-setosa | 1 | 0 |
36 | 36 | 1.2 | 0.2 | 5.0 | 3.2 | Iris-setosa | 1 | 0 |
37 | 37 | 1.3 | 0.2 | 5.5 | 3.5 | Iris-setosa | 1 | 0 |
38 | 38 | 1.5 | 0.1 | 4.9 | 3.1 | Iris-setosa | 1 | 0 |
39 | 39 | 1.3 | 0.2 | 4.4 | 3.0 | Iris-setosa | 1 | 0 |
40 | 40 | 1.5 | 0.2 | 5.1 | 3.4 | Iris-setosa | 1 | 0 |
41 | 41 | 1.3 | 0.3 | 5.0 | 3.5 | Iris-setosa | 1 | 0 |
42 | 42 | 1.3 | 0.3 | 4.5 | 2.3 | Iris-setosa | 1 | 0 |
43 | 43 | 1.3 | 0.2 | 4.4 | 3.2 | Iris-setosa | 1 | 0 |
44 | 44 | 1.6 | 0.6 | 5.0 | 3.5 | Iris-setosa | 1 | 0 |
45 | 45 | 1.9 | 0.4 | 5.1 | 3.8 | Iris-setosa | 1 | 0 |
46 | 46 | 1.4 | 0.3 | 4.8 | 3.0 | Iris-setosa | 1 | 0 |
47 | 47 | 1.6 | 0.2 | 5.1 | 3.8 | Iris-setosa | 1 | 0 |
48 | 48 | 1.4 | 0.2 | 4.6 | 3.2 | Iris-setosa | 1 | 0 |
49 | 49 | 1.5 | 0.2 | 5.3 | 3.7 | Iris-setosa | 1 | 0 |
50 | 50 | 1.4 | 0.2 | 5.0 | 3.3 | Iris-setosa | 1 | 0 |
51 | 51 | 4.7 | 1.4 | 7.0 | 3.2 | Iris-versicolor | 0 | 1 |
52 | 52 | 4.5 | 1.5 | 6.4 | 3.2 | Iris-versicolor | 0 | 1 |
53 | 53 | 4.9 | 1.5 | 6.9 | 3.1 | Iris-versicolor | 0 | 1 |
54 | 54 | 4.0 | 1.3 | 5.5 | 2.3 | Iris-versicolor | 0 | 1 |
55 | 55 | 4.6 | 1.5 | 6.5 | 2.8 | Iris-versicolor | 0 | 1 |
56 | 56 | 4.5 | 1.3 | 5.7 | 2.8 | Iris-versicolor | 0 | 1 |
57 | 57 | 4.7 | 1.6 | 6.3 | 3.3 | Iris-versicolor | 0 | 1 |
58 | 58 | 3.3 | 1.0 | 4.9 | 2.4 | Iris-versicolor | 0 | 1 |
59 | 59 | 4.6 | 1.3 | 6.6 | 2.9 | Iris-versicolor | 0 | 1 |
60 | 60 | 3.9 | 1.4 | 5.2 | 2.7 | Iris-versicolor | 0 | 1 |
61 | 61 | 3.5 | 1.0 | 5.0 | 2.0 | Iris-versicolor | 0 | 1 |
62 | 62 | 4.2 | 1.5 | 5.9 | 3.0 | Iris-versicolor | 0 | 1 |
63 | 63 | 4.0 | 1.0 | 6.0 | 2.2 | Iris-versicolor | 0 | 1 |
64 | 64 | 4.7 | 1.4 | 6.1 | 2.9 | Iris-versicolor | 0 | 1 |
65 | 65 | 3.6 | 1.3 | 5.6 | 2.9 | Iris-versicolor | 0 | 1 |
66 | 66 | 4.4 | 1.4 | 6.7 | 3.1 | Iris-versicolor | 0 | 1 |
67 | 67 | 4.5 | 1.5 | 5.6 | 3.0 | Iris-versicolor | 0 | 1 |
68 | 68 | 4.1 | 1.0 | 5.8 | 2.7 | Iris-versicolor | 0 | 1 |
69 | 69 | 4.5 | 1.5 | 6.2 | 2.2 | Iris-versicolor | 0 | 1 |
70 | 70 | 3.9 | 1.1 | 5.6 | 2.5 | Iris-versicolor | 0 | 1 |
71 | 71 | 4.8 | 1.8 | 5.9 | 3.2 | Iris-versicolor | 0 | 1 |
72 | 72 | 4.0 | 1.3 | 6.1 | 2.8 | Iris-versicolor | 0 | 1 |
73 | 73 | 4.9 | 1.5 | 6.3 | 2.5 | Iris-versicolor | 0 | 1 |
74 | 74 | 4.7 | 1.2 | 6.1 | 2.8 | Iris-versicolor | 0 | 1 |
75 | 75 | 4.3 | 1.3 | 6.4 | 2.9 | Iris-versicolor | 0 | 1 |
76 | 76 | 4.4 | 1.4 | 6.6 | 3.0 | Iris-versicolor | 0 | 1 |
77 | 77 | 4.8 | 1.4 | 6.8 | 2.8 | Iris-versicolor | 0 | 1 |
78 | 78 | 5.0 | 1.7 | 6.7 | 3.0 | Iris-versicolor | 0 | 1 |
79 | 79 | 4.5 | 1.5 | 6.0 | 2.9 | Iris-versicolor | 0 | 1 |
80 | 80 | 3.5 | 1.0 | 5.7 | 2.6 | Iris-versicolor | 0 | 1 |
81 | 81 | 3.8 | 1.1 | 5.5 | 2.4 | Iris-versicolor | 0 | 1 |
82 | 82 | 3.7 | 1.0 | 5.5 | 2.4 | Iris-versicolor | 0 | 1 |
83 | 83 | 3.9 | 1.2 | 5.8 | 2.7 | Iris-versicolor | 0 | 1 |
84 | 84 | 5.1 | 1.6 | 6.0 | 2.7 | Iris-versicolor | 0 | 1 |
85 | 85 | 4.5 | 1.5 | 5.4 | 3.0 | Iris-versicolor | 0 | 1 |
86 | 86 | 4.5 | 1.6 | 6.0 | 3.4 | Iris-versicolor | 0 | 1 |
87 | 87 | 4.7 | 1.5 | 6.7 | 3.1 | Iris-versicolor | 0 | 1 |
88 | 88 | 4.4 | 1.3 | 6.3 | 2.3 | Iris-versicolor | 0 | 1 |
89 | 89 | 4.1 | 1.3 | 5.6 | 3.0 | Iris-versicolor | 0 | 1 |
90 | 90 | 4.0 | 1.3 | 5.5 | 2.5 | Iris-versicolor | 0 | 1 |
91 | 91 | 4.4 | 1.2 | 5.5 | 2.6 | Iris-versicolor | 0 | 1 |
92 | 92 | 4.6 | 1.4 | 6.1 | 3.0 | Iris-versicolor | 0 | 1 |
93 | 93 | 4.0 | 1.2 | 5.8 | 2.6 | Iris-versicolor | 0 | 1 |
94 | 94 | 3.3 | 1.0 | 5.0 | 2.3 | Iris-versicolor | 0 | 1 |
95 | 95 | 4.2 | 1.3 | 5.6 | 2.7 | Iris-versicolor | 0 | 1 |
96 | 96 | 4.2 | 1.2 | 5.7 | 3.0 | Iris-versicolor | 0 | 1 |
97 | 97 | 4.2 | 1.3 | 5.7 | 2.9 | Iris-versicolor | 0 | 1 |
98 | 98 | 4.3 | 1.3 | 6.2 | 2.9 | Iris-versicolor | 0 | 1 |
99 | 99 | 3.0 | 1.1 | 5.1 | 2.5 | Iris-versicolor | 0 | 1 |
100 | 100 | 4.1 | 1.3 | 5.7 | 2.8 | Iris-versicolor | 0 | 1 |
[3]:
from verticapy.learn.svm import LinearSVC
vp.drop("svc_setosa_iris")
model = LinearSVC("svc_setosa_iris")
model.drop()
model.fit(iris, ["PetalLengthCm", "SepalLengthCm"], "Species_Iris-setosa")
model.plot()
[3]:
When we have more than two categories, we use the expression ‘Multiclass Classification’ instead of ‘Classification’.
Unsupervised Learning#
These algorithms are to used to segment the data (k-means, DBSCAN, etc.) or to detect anomalies (Local Outlier Factor, Z-Score Techniques…). In particular, they’re useful for finding patterns in data without labels. For example, let’s use a k-means algorithm to create different clusters on the Iris dataset. Each cluster will represent a flower’s species.
[4]:
from verticapy.learn.cluster import KMeans
vp.drop("KMeans_iris")
model = KMeans("KMeans_iris", n_cluster = 3)
model.fit(iris, ["PetalLengthCm", "SepalLengthCm"])
model.plot()
[4]:
In this section, we went over a few of the many ML algorithms available in VerticaPy. In the next lesson, we’ll cover creating a regression model.