Customer Attrition Prediction for a Telecom Company

Cynthia Fonderson | Nov 15, 2023 min read

Service providers across different industries use customer attrition analysis because the cost of retaining an existing customer is far less than acquiring a new one. In this project, I apply machine learning algorithms to predict a telecom company’s customer churn based on several factors, including tenue rate, gender and payment methods amongst others.

Exploratory Data Analysis

The dataset used in this project was curated from Kaggle, and the column descriptions are as follows:

NameDescription
StateU.S. state code (string)
Account lengthAccount tenure in days (integer)
Area codeU.S. area code (integer)
International planDoes the customer have an international subscription plan (string)
Voice mail planDoes the customer have a voice mail subscription plan (string)
No. vmail messagesNumber of voicemail messages on plan (integer)
Total day minutesTotal number of minutes used during the day (double)
Total day callsTotal number of calls made during the day (integer)
Total day charge:Total charge accrued during the day (double)
Total eve minutesTotal number of minutes used in the evening (double)
Total eve callsTotal number of calls made in the evening (integer)
Total eve chargeTotal charge accrued in the evening (double)
Total night minutesTotal number of minutes used during at night (double)
Total night callsTotal number of calls made at night (integer)
Total night chargeTotal charge accrued at night (double)
Total intl minutesTotal number of international minutes used (double)
Total intl callsTotal number of calls made (integer)
Total intl chargeTotal charge accrued from international transactions (double)
CS callsNumber of customer service calls from the user
ChurnCustomer retension metric (bool)

Following data quality check and feature engineering, I explored pertinent relationships in the dataset. For instance, the relationship between churn and day and night charges. day night

Insight: In general, customers who churned had higher daily charges than customers who were retained, but the attrition distribution was similar when comparing night charges

Predicting churn using classification algorithms

Once I gained a deeper understanding of the dataset, I trained and evaluated five classifier algorithms to predict customer loss, including the

  • Dummy Classifier
  • Logistic Regression
  • Support Vector Machine Classifier
  • Random Forest Classifier
  • Naïve Bayes Classifier

To visualize the performance of the five models trained in this project, I used ROC curves. The Random Forest and Naïve Bayes Classifiers outperformed the other algorithms in predicting customer churn, with accuracies of 90% and 82%, respectively.

roc curve

Lastly, I tried to optimized the Random Forest Classifier by tuning its hyperparameters and increased its prediction accuracy to 99%. According to this model, the most important factors that influence customer attrition at this company are the charges paid, number of calls places to customer service, subscription to an international plan and services (minutes, calls) during the day. night

Full Project