How to develop churn prediction model for telecom company?

I am working in a telecom company, which is interested in developing a churn prediction model. I want to know the which steps should I follow in order to develop such kind of model. Any help regarding the problem is highly appreciated. thanks in advance.

Replies to This Discussion

We did this for a financial company a while ago. It depends highly on the context you want to bring along.

I would say that starting with the current CRM database is a solid base. Most likely, the number of customer care calls, the number of complaint e-mails etc. give a good indicator of churn. Just counting will most likely not be sufficient though, you will need to analyze the content of the e-mail, audio from the conversations with customer care, web behavior and perhaps even social network analysis.

You are right, the most important place to dig is in Customer Care system or better say CRM database. What I want is that what are the steps in an order way to design the prediction model and of course which model best suits for analyzing telecom data.

After researching a lot in whitepapers and articles in scholar.google.com about CHURN PREDICTION in telecom I came to these conclusions, I want you gurus to confirm what I have conclude and if you think I am wrong, give comment and guide me with correct solution. Here is my findings:

Step1: find as much attributes in telecom data as you can, and make a dataset of those data. the data are:

Customer Demographic data such as:

Zipcode

Income

Occupation

Age

Gender

Living Address

Occupation Address

Purchase History:

Number of Service Purchase

Value of Purchase

Last Date purchase

Payment type

Products

Product/Service/Campaign type

Product diversity

Customer Relation Data:

Number of Questions about the services from e.g. IVR

Number of Visits to retail shops or online website (e.g selfcare website)

Number of Complaints solved

Number of total complaints

Service Usage:

Number of calls

Number of Outgoing calls

Number of Incoming calls

Number of Roaming Calls

Number of International Calls

Number of SMS

Total minutes of calls

Number of VAS activations

Number of VAS deactivations

Number of joining in campaigns

Billing Data:

Total amount of bill

Total amount of voice call

total amount of VAS service (MMS/GPRS/etc)

total amount of SMS

total number of barred (one-way barred)

total number of full barred (two-way barred)

average number of days that payment is done after bill due date.

and more data (I would be glad if any other attributes you think can help).

after having these data, we should extract in a period of time the data, I mean by having these attributes we should for example have a training set of 3 months (e.g. Jan 2013, Feb 2013, Mar 2013) and extract those customers in this period of time (Jan , Feb and March) which leave the company (Am i right?) and then by having this dataset of churned and unchurned customers in Jan and Feb and March 2013 we can go to step 2 for further processes to finally could build a model which can predict the churn rate of customers in April 2013(Am i right? I want to know whether I am doing right or not?).

I think in this step I should find as much hypothesis as I can from the data in dataset which is highly related to the reason why a customer churned, by that I mean for example I may say from the dataset that "out of all churned customers in this dataset, 80% of them had filled online complaint form before leaving" and then test this percentage (80%) with the unchurned customers in the dataset in this period(JAN and Feb and MArch) to see whether this is also true about unchurned customer or not? (Am I right?)

After finding some assumptions or hypothesis or rules(Am i right with this word?), then we are ready to build our prediction model.(Am I right?)

Since you have a whole gamut of data available its just information which you need to extract from the same. You need to first prioritize what info you want first. You can take out the basic first and then derive more models later

a) Churn propensity of the customers basis their AON and ARPU--Trace the churn pattern over a historical dataset and cull out the line graph and chalk the grey areas.

b) Which mode the customers are churning out of the network - involuntary or voluntary. In the above identified grey areas you need to define the mode for more drill down.

The above hypothesis is correct but then use a structured analysis by asking questions to yourself.

im a fifth year student currently doing a thesis entitle "applying data mining technique among broadband subscribers". right nw im having a difficulty in acquiring dataset from a company to do my study..as an alternative i have come up to make a survey instead.. my question is

is it enough by doing survey i can construct a predictive model about churn customer management?

and one of my study problem is to find out the factors that affecting the churn customer, will i be able to answer the study problem by doing the survey only to get data?

The set of fields for the analysis seems reasonable. However, in our experience with churn analysis in telecom industry and customer retention in general you have to capture not only the total or average values, but use a temporal abstraction approach, where you look at service usage and billing over the last N months before churn or current date (if no churn). The steps are well described, e.g. in Handbook of Statistical Analysis and Data Mining Applications by Robert Nisbet, John Elder IV and Gary Miner.

With respect to the method selection, I would recommend trying Stochastic Gradient Boosting approach that usually gives robust and accurate results in such applications. If you look for better interpretability, then classification trees and logistic regression might be of help. Although in the latter case you will have to check on initial assumptions (e.g. multicollinearity, heteroscedasticity, etc.).

I appreciate this answer. I happened to be a potential consideration for such an effort, when the client's vendor / contractor had ton of analyses for this telecoms their ad effectiveness -- time of the day, etc summarized from major telecoms ; and the idea was to develop Predictive Model for those factors and the churn - which telecom is loosing to which telecom is loosing; not sure where they ended up-from the recruiter; they paid hefty sum for somebody to develop ; the churn prediction models .. The developed product would be used to Market such methodology.

In the telecom which I am working I had access to 3 cycle of billing info. each cycle is 2 months. cycle 1, cycle 2 and cycle 3 respectively. cycle 1 is the most old cycle.

firstly I brought out all active customers from cycle 1. I did this by querying those customers who had feature 1:call duration>0 OR feature 2: SMS Amount>0 OR feature3:VAS Amount>0.

then in cycle 2 and cycle 3, I labeled those customers who had those three features bigger than zero (with AND not OR) as Active and others as Churned. Actually I had a question that based on which feature I should label the records??