# Pareto Analysis

Pareto Analysis is a nice technique for exploring the distribution of a measure across its contributors.  Regardless of whether the measure is positive (ex. sales amount) or negative (ex. production defects) the goal is the same. In many cases, the bulk (~80%) of a measured quantity (ex. sales amount) is contributed by a minority (~20%) of the contributing population (ex. customers).  This is why it is often referred to as the 80/20 rule.

By isolating this subset to top contributors (ex. our best customers) we can start asking/answering some very interesting questions…

• What attributes are shared by our top customers?
• What attributes distinguish this group of top customers from the rest?
• What are the purchasing patterns of the top customers?
• How do these purchasing patterns differ from the rest?

Note: these types of questions (above) can be answered fairly quickly with a bit of basic data mining (clustering algorithms to group customers together based on highly correlated attributes, decision trees to analyze buying patterns, etc) available via the MDS Add-In for Excel or the full-blown data mining models available via SSAS Multidimensional.

Ultimately, we are trying to figure out what we can do to:

• boost the sales revenue from the rest of our customers perhaps through targeted promotions
• target new customers who share attributes that are highly correlated with existing top customers

Perhaps we want to take a few additional steps such as analyzing customer churn and seeing how that intersects with our top customers…the ones we definitely don’t want to lose. This will tell us who’s in risk of churning and how much we can spend (profitably) to keep them from churning.

Or maybe we decide to further segment our customers into multiple groups:

• segment A: top customers responsible for 80% of sales revenue
• segment B: middle tier customers responsible for the next 15% of sales revenue
• segment C: bottom tier customers responsible for the final 5% of sales revenue

From there we might be interested in understanding the movement between segments over time: what are the top factors/characteristics leading to the transition of a customer from segment B to segment A…and vice versa. As you can see, there are a ton of possibilities that all start from the pareto analysis.

### Pareto Chart

Pareto charts, like the one below, are a common method for visualizing this type of analysis.

Bottom Axis – products in descending order by sales amount
Left Axis – the total sales amount for each product (vertical bars)
Right Axis – the cumulative percent of total contribution (line)

Below is a high-level description of how to calculate the curved line in the chart above (which I usually just refer to it as the “pareto line” because saying “cumulative percentage of the total”-line is a bit of a mouthful):

1. Create a list of sales amounts by product and order it from highest to lowest based on sales amount
2. Calculate the “Total Sales Amount” for the entire list of products
3. Calculate the “Percent of Total Sales Amount” for each product in the list
4. Calculate a running sum over the “Percent of Total Sales Amount”

In the next of the post, I’m going to walk you through the process of creating the Pareto Chart in SSRS.