A Market Basket Analysis of A Bakery Chain's Customer Transactions


Market Basket Analysis is usually used to help find which items are frequently purchased at the same time. By doing so, it provides insights into the purchasing behavior of customers that can help a business increase sales and maintain inventory. A simple example would be the occurrence of shampoo and conditioner in the same sales transaction. However, the real value of Market Basket is finding associations between seemingly non-intuitive items like the age-old "diapers and beers" story:

The legend says that a study was done a retail grocery store.The findings were that men between 30- 40 years in age, shopping between 5pm and 7pm on Fridays, who purchased diapers were to also have beer in their carts. This motivated the grocery store to move the beer isle closer to the diaper and wiz-boom-bang, instant 35% increase in sales of both.

Two General Steps:

1. Calculate support for combinations of items
support(X) = X/T or the proportion of transactions (T) that contain itemset X
ex. How often does each individual item occur given:
Basket 1: A, B
Basket 2: A, C
Basket 3: A, D, E
Basket 4: B, C, D
Basket 5: A, B, D, E

X
A: in 4 baskets
B: 3
C: 2
D: 3
E: 2

Calculate Support to determine how common these itemsets are
A: appears 4 times/5 baskets = 0.8 (80%)
B:. 3/5 = 0.6 (60%)
C: 2/5 = 0.4 (40%)
D: 3/5 = 0.6 (60%)
E: 2/5 = 0/4 (40%)

2. Calculate confidence or conditional probability
conf(X => Y) = supp(X U Y)/supp(X) or if X occurs, then what is the probability that Y will also occur?
ex.
How often does each individual item occur given:
Basket 1: A, B
Basket 2: A, C
Basket 3: A, D, E
Basket 4: B, C, D
Basket 5: A, B, D, E

conf(A => D)?
supp(A U D)/supp(A)
= basket 3 and basket 5 both have A and D so (2/5)/(4 baskets have A/5 total of baskets) =0.4/0.8 = 0.5
= if a person buys A and B then there's a 50% chance he will also buy D


What is FP-Growth Algorithm?


Hareen Laks has also written an easy to understand post on his blog about the steps.

Goal

Given a set of transactions, find all rules or associations having support >= minimum support threshold and confidence >= minimum confidence threshold.

What data are going to use?

The Extended Bakery Dataset contains 5,000 rows listing the transactions involving a bakery's 50-item menu.

Invoice (or Receipt) No. followed by 0's and 1's indicating if an item was on a given transaction



What's under the hood?


Minimum Support: 0.03
Minimum Confidence: 0.8

Results:

We can view the results in different tabulation formats in Rapidminer. In this one, we see which items passed the parameters we set (A), the list of  antecedents and consequents or rules (B),  the Lift provided by each itemset rules (C), along with the other values. Using the table, we are able to identify, for example,  that only 10 items satisfied our parameters, Item08 + Item46  imply or predict Item38 & Item 12, that it happens 3.1% of the time but the confidence level is 91% and that it has the greatest Lift value among the rules.

The picture above enumerates all the rules generated by our parameters in simple text format.  For example, we are able to see quickly that the last 3 rows have 1.000 (the highest) confidence. This means that everytime the items on the left hand side are bought together, the item on the right will be bought as well.


Rapidminer also provides graphs to help make the results easier to understand. Some notable examples: Items19, 36, and 04 have a good mutual influence with each other and it could be a good idea to place them alongside one another in the bakery. The same goes with Items01, 03, and 47.



Comments