Telco Churn¶
This example uses the Telco Churn dataset to predict which Telco user is likely to churn; that is, customers that will likely stop using Telco. You can download the Jupyter Notebook of the study here.
- Churn - customers that left within the last month
- Services - services of each customer (phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies)
- Customer account information - how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
- Customer demographics - gender, age range, and if they have partners and dependents
We will follow the data science cycle (Data Exploration - Data Preparation - Data Modeling - Model Evaluation - Model Deployment) to solve this problem.
Initialization¶
This example uses the following version of VerticaPy:
import verticapy as vp
vp.__version__
Connect to Vertica. This example uses an existing connection called "VerticaDSN." For details on how to create a connection, use see the connection tutorial.
vp.connect("VerticaDSN")
Let's create a Virtual DataFrame of the dataset. The dataset is available here.
churn = vp.read_csv('data/churn.csv')
display(churn)
Data Exploration and Preparation¶
Let's examine our data.
churn.describe(method = "categorical", unique = True)
Several variables are categorical, and since they all have low cardinalities, we can compute their dummies. We can also convert all booleans to numeric.
for column in ["DeviceProtection",
"MultipleLines",
"PaperlessBilling",
"Churn",
"TechSupport",
"Partner",
"StreamingTV",
"OnlineBackup",
"Dependents",
"OnlineSecurity",
"PhoneService",
"StreamingMovies"]:
churn[column].decode("Yes", 1, 0)
churn.one_hot_encode().drop(["customerID",
"gender",
"Contract",
"PaymentMethod",
"InternetService"])