“Target the right customers and make them happier!”
Machine learning provides exciting and actionable insights for your customer retention plan.
As a fellow for the Insight Data Science program, I consulted for an undisclosed startup company for three weeks. The company provides a business analytics platform for customer retention and success for over a hundred Software as a Service (SaaS) companies. The startup company aims to help customer success managers (CSM) of the SaaS companies, who optimize customer relationships so that the customers continue to be satisfied with their services and do not churn, i.e., renew the subscriptions.
The startup company wanted to help CSMs expand their customer base by increasing their work efficiency with their analytics platform based on user histories. More specifically, the startup company wanted to implement a feature in their analytics platform for the CSMs to decide whom to call today within the limited time:
“If I call this customer today, how much will she be more satisfied with my company’s service product?”
“If I call this customer now, how much can she be more interested in subscription renewal?”
Answering these questions is not trivial; you may have to take different actions for different customers even though they are equally satisfied with the service. Consider two customers. One customer who is unhappy with the service may have already decided not to renew the subscription and in this case, a CSM does not need to spend time on calling her. The other customer, who is equally unhappy, may just begin to think about another product, so there is still a chance for renewal and the CSM must contact this customer.
For this non-trivial problem, I provided a metric based on probability for renewal, by quantifying how much the probability for renewal “increases” when the CSMs contact the customers. This metric will be implemented in the current analytics platform of the startup company that I consulted for, as one of the important metrics, after in-house evaluation.
Creating the Data Pipeline: Preprocessing Customer Data and Feature Engineering
I received a MySQL file of 8 GB including over 100 data tables and containing the names of the SaaS companies, and their customers’ subscription purchase history. I processed the information to be more tangible for data scientists to work on by connecting the SaaS company IDs and their corresponding customer IDs in one table to another. I pulled out the customers’ subscription information including:
- customer’s company ID
- entire subscription period
- last subscription period
- whether the customer churned or not
- numbers of email communications
- sentiment scores of the email contents, which I computed by using NLTK Vader pre-trained sentiment analyzer
I used all this information as inputs of a predictive model coded with a machine learning algorithm, Random Forest (RF) [1, 2]. The output of the Random Forest model, more precisely, Random Forest “classifier”, predicts whether the customers will churn or not.
I faced a common challenging problem in data science/analytics, “imbalanced dataset”. Customer churn rarely happens. Thus, the patterns that you find in the customers who churned can be just one part of a variety of patterns of customers who did not churn, or can be difficult to pin down simply because you do not have enough churn cases (low sample size and thus large statistical error).
Building and perfecting a Customer Churn Model: Random Forest with Imbalanced Data
To address these statistical difficulties, I decided to systematically select companies that show similar patterns of their churn. I looked into the feature importance scores that the RF classifier gives out, in particular, on the company identities. Here, I used individual company identities as distinct features. The importance scores quantified how much individual features affected the classifier’s decision on churn/non-churn.
I found an interesting fact that the higher number of churn that a company shows, the higher the feature importance score of the company is (Figure 3). Thus, to have as many churn cases as possible and large enough importance scores, I decided to keep companies showing churn numbers larger than 10 and analyzed them.
With the selected companies, I re-trained the random forest classifier and I improved the prediction of churn dramatically. The area under the precision-recall curve was increased from 0.82 to 0.90. The recall (true churn case that the algorithm found divided by the total number of actual true positive) was improved from 63% to 86%. This means that the churn cases can be predicted with high sensitivity, and correspondingly can take an action to prevent the customers from churning.
This result can be applied to more general settings: When you have too many companies/clients and need to characterize their type of businesses or product lines, there can be a way to screen out and select companies showing similar patterns based on the feature importance!
This high recall enabled me to provide a metric that helps customer success managers (CSMs) make actionable plans. The CSMs may often ask to themselves the following questions:
“Whom should I email or call now?”
“I have only 20 minutes now. Who is the most important person to call?”
To answer these questions, I looked into another useful quantity that the RF classifier provides: probability for subscription renewal. If the probability is close to 1, this means that the customer will much likely renew the current subscription. In this case, the CSM does not need to call the customer and rather focus her effort to other customers. But, if the probability is 0.5, the customer has a 50% chance to renew the current subscription. In this case, it may help to contact the customer to keep him retained. But how many times of emails or phone calls is it going to be helpful for him to be retained? To answer this question, I computed the renewal probability for a range of the reasonable communication frequencies.
In this graph, I considered two customers, Olivia and Emma, with whom a company have been communicating in many and small times, respectively. Emma had email conversations once or twice a week, while Olivia over 10 times a week! The chance that Emma is going to renew her subscription is 0.2, i.e., she is unhappy with the service. Olivia has a chance to renew 50% and she is moderately happy. The CSM should have not spent her time on Olivia this much because Olivia is going to be still in the same level of satisfaction even when the CSM communicated only 3 times a week! Instead, if the CSM could have focused on Emma and communicated with her 3 times a week, the CSM could have made the Emma much happier, more than twice!
I would like to note that the trained RF classifier was applied to the SaaS companies that the startup company that I consulted for. The importance scores were used to select companies showing similar patterns of customer churn (Figure 5, Left). The same analysis can be applied for a given SaaS company and its customer groups. As shown in Figure 5, we can consider the SaaS companies in my analysis as different customer groups, and customers of a SaaS company in my analysis (Figure 5, Left) as customers belonging to a customer group (Figure 5, Right). In this way, you can collectively study the pattern of churn of multiple groups of customers.
In this blog, I showed how machine learning approaches can be useful to optimize the work of customer success managers. I provided a systematic customer segmentation method based on feature importance scores to enhance churn prediction. Based on the selected customer segments showing high feature importance scores, I provided a metric to manage the customers belonging to the segments. Now, you can employ data analytics in your customer retention strategies and
“Target the right customers and make them happier!”
Kyung Hyuk Kim
Insight Data Science Fellow
* The picture on the top was obtained from allbusiness.com.