Service providers across different industries use customer attrition analysis because the cost of retaining an existing customer is far less than acquiring a new one. In this project, I apply machine learning algorithms to predict a telecom company’s customer churn based on several factors, including tenue rate, gender and payment methods amongst others.
Exploratory Data Analysis
The dataset used in this project was curated from Kaggle, and the column descriptions are as follows:
Name | Description |
---|---|
State | U.S. state code (string) |
Account length | Account tenure in days (integer) |
Area code | U.S. area code (integer) |
International plan | Does the customer have an international subscription plan (string) |
Voice mail plan | Does the customer have a voice mail subscription plan (string) |
No. vmail messages | Number of voicemail messages on plan (integer) |
Total day minutes | Total number of minutes used during the day (double) |
Total day calls | Total number of calls made during the day (integer) |
Total day charge: | Total charge accrued during the day (double) |
Total eve minutes | Total number of minutes used in the evening (double) |
Total eve calls | Total number of calls made in the evening (integer) |
Total eve charge | Total charge accrued in the evening (double) |
Total night minutes | Total number of minutes used during at night (double) |
Total night calls | Total number of calls made at night (integer) |
Total night charge | Total charge accrued at night (double) |
Total intl minutes | Total number of international minutes used (double) |
Total intl calls | Total number of calls made (integer) |
Total intl charge | Total charge accrued from international transactions (double) |
CS calls | Number of customer service calls from the user |
Churn | Customer retension metric (bool) |
Following data quality check and feature engineering, I explored pertinent relationships in the dataset. For instance, the relationship between churn and day and night charges.
Insight: In general, customers who churned had higher daily charges than customers who were retained, but the attrition distribution was similar when comparing night charges
Predicting churn using classification algorithms
Once I gained a deeper understanding of the dataset, I trained and evaluated five classifier algorithms to predict customer loss, including the
- Dummy Classifier
- Logistic Regression
- Support Vector Machine Classifier
- Random Forest Classifier
- Naïve Bayes Classifier
To visualize the performance of the five models trained in this project, I used ROC curves. The Random Forest and Naïve Bayes Classifiers outperformed the other algorithms in predicting customer churn, with accuracies of 90% and 82%, respectively.
Lastly, I tried to optimized the Random Forest Classifier by tuning its hyperparameters and increased its prediction accuracy to 99%. According to this model, the most important factors that influence customer attrition at this company are the charges paid, number of calls places to customer service, subscription to an international plan and services (minutes, calls) during the day.