Marketing Analytics: Conducting a Customer Segmentation With RFM Analysis in Python

7 min readJul 10, 2021

RFM analysis is a customer segmentation technique used to prioritise and target the right customers. It uses three key data points — recency, frequency, and monetary value — to create a scoring system that segments customers into groups based on their value to a company.

This article will cover how you can apply the RFM model to your own business so that you can get a better understanding of your customers perform marketing according in order to for example increase conversion rates on your website. Let’s start by further examining the parameters that make up the RFM score.

The Parameters of RFM Analysis

The RFM score is the aggregate of three parameters: recency, frequency, and monetary value. For clarity, we’ll define each parameter as it relates to RFM.

Recency is calculated based on how long it’s been since a customer’s last purchase. It’s the most effective way of determining who’s going to buy from you today. Essentially, a customer who’s made a recent purchase with you is considered more valuable than a customer who hasn’t ordered in a while.
Frequency is the number of purchases a customer has made in a specific time frame. It can be a good indicator of how often a customer interacts with your brand. That can give you an idea of how often you should try to reach them through marketing.
Monetary value is simply the amount a customer spends on your product or service within a given time frame. While it’s not as good as the other parameters at predicting when a customer will buy from you again, it can give you an idea of how much they might spend if they do.

Now we are going to analyse a dataset in python and calculate the RFM score for each customer based on their historical transaction data.

Customer Segmentation with RFM in 6 Steps

Defining the Business Problem
Understanding the Data
Data Preparation
Calculating RFM Metrics
Calculating RFM Scores
Naming & Analysing RFM Segments

1. Defining the Business Problem

Our hypothetical business problem for this case study is relatively straight forward. Imagine you working in the marketing department of a large retail company based in the United Kingdom. Your company is planning to make a large investment into a country wide marketing campaign in order to boost its future sales. However, first it is important that the company has a strong understanding of its different customer groups. In this way we can conduct and targeted marketing campaign to ensure that we are being as effective as possible in our marketing strategies.

You as the marketing analyst are required to study the historical customer sales data and report back to your manager with insights on different customer groups and recommendations on which customers the company should target most aggressively.

2. Understanding the Data

It is a good idea to first explore the data and get a better understanding of what we are dealing with before starting our customer segmentation. Let’s first import the data using pandas.

We see that the dataset has the following columns:

Invoice Number
Stock Code
Product Description
Quantity Sold (In Single Purchase)
Invoice Date
Price
Customer ID
Country

Here we can observe the count of company sales we have by country. It’s clear that the majority of company sales are being made locally in the United Kingdom.

Since, we are planning on doing a country wide marketing campaign we won’t be focusing on customers outside of the United Kingdom. Thus, we will remove these customers from our dataset.

Here we simply filter the DataFrame where country is equal to ‘United Kingdom’.

We use the .shape method to observe that we have 485,852 instances of sales in our DataFrame.

It is important to check for null values. It is clear using the isnull() method that there are 106,429 null values under the customer ID column.

The groupby() below allows us to see what are the most popular products.

The company has made 26,633 unique sales.

3. Data Preparation

Now that we have obtained a greater understanding of our dataset lets prepare it for our RFM analysis.

It is important to note that in this dataset those invoices that proceed with the letter C have a negative quantity as these invoices are referring to customer returns. For our study we will remove these invoices from the dataset as they don’t provide a good representation of the customer purchasing behaviour we are trying to capture with an RFM analysis.

The line of code below filters those invoices that begin with a C out of our dataset.

Furthermore, we have have a column for price and quantity, but we don’t have a total price column. Therefor, we create it ourselves.

Lastly, we discovered earlier that our dataset contained instances where there were null values for customer ID. When conducting an RFM analysis we can’t use information that we can’t assign to a customer, therefor, we will have to drop these null values as well.

4. Calculating RFM Metrics

Now its is time to calculate the RFM metrics which can be done using the code below. Remember the three RFM metrics are recency, frequency and monetary. Each metric is calculated in the following manner:

Recency

We group by unique customer ID and then we calculate in days the difference between todays date the date of that customers last invoice to understand how many days has it been since each customer made a purchase.

Frequency

We simply calculate the number of invoices for each unique customer ID.

Monetary

We calculate the sum of the TotalPrice column we created previously and we once again aggregate by each unique customer ID.

5. Calculating RFM Scores

Calculating the RFM Scores is simple. We use the qcut method in pandas to cut each of the RFM Metric columns into five quantiles with an equal number of instances and we label each quantile 1, 2, 3, 4 and 5.

We then create three new columns called RecencyScore, FrequencyScore and MonetaryScore which will contain the labels of each quantile from each corresponding RFM metric.

For example, those customers who are in quantile 5 for MonetaryScore will be the 20% of customers who spent the most money at the retail store. Those customers who are in quantile 1 for MonetaryScore will be the 20% of customers who spent the least. The same reasoning is applied on RecencyScore and FrequencyScore.

Lastly, we create another column called RFM score which combines all of these metrics by concatenating the RecencyScore, FrequencyScore and MonetaryScore labels. Now we have an RFM Score for each customer.

We can filter by those customers who have an RFM score of 555. These are the customers who are in the top quantile for three metrics, Recency, Frequency and Monetary.

From studying our entire customer base these are the customers who have been to the store most recently, most frequently and who have spent the most at the store. We can label these customers as our champion customers.

These are the customers that we don’t need focus on when we are implementing a marketing campaign.

We can also see the customers that are labelled 111. These are the customers that the company is in most danger of losing over the coming weeks. Therefor, it is recommended that the company targets these customers heavily when implementing its marketing campaign.

6. Naming & Analysing RFM Segments

For the last step we will create ten different customer segments using the segment map that is outlined below.

Here we calculate the mean for each metric and the count for customers within each customer segment. It is clear that a large amount of customers are sitting in the ‘at risk’ segment, therefore, it seems that now is great time for this company to invest in a marketing campaign.