A Market Basket Analysis of A Grocery's Customer Transactions

retail

Simply put, Market Basket Analysis is conducted to help find associations in a data set. It can yield results that show which items are frequently bought together and at the same time learn the purchasing behavior of customers.

When conducting a Market Basket, analysts usually look for two things first:
  • Support - a count of the number of times where a combination of items (or itemsetsoccured
  • Confidence - the probability that a rule will occur (ex. the confidence or conditional probability that people that bought shampoo and conditioner will also buy a bar of soap)

One can also measure the relationship of the items in each rule when exploring the results:
  • Lift - a score that shows the strength of correlation or relationship. Zero means no correlation while a positive score means that the items on the left hand side (antecedent) predict the items on the right hand side (consequence) of the rule

What is Apriori?

Principle:

  • If an itemset is frequent, then all of its subsets must also be frequent and vice versa
  • Support of an itemset never exceeds the support of its subsets
What it does:
  • Calculates the support for single itemsets one at a time
  • Itemsets that are less than the declared minimum support (minsup) are discarded
  • After that, it then expands to evaluating two-items itemsets and so on and so forth
It's pretty simple and commonly used but it's also known to be very slow for very large item sets. This is because Apriori uses a breadth-first search meaning it goes through the data multiple times (i.e. Single item itemsets first, and then run through the data again for the two-item ones, and then the three-item ones, etc.). Singularities.com ran a test to prove this:



What data are going to use?

The Groceries data set contains 1 month (30 days) of real-world POS data from a typical local grocery outlet. The data set contains 9835 transactions and the items are aggregated to 161 categories

Results

Support guide for rule generation
Whole milk is the most frequently bought item in the data. We will use its 0.25 support as guide in setting the minimum support for rule generation.

Rules Generated in text format. I shortened the list to 20 for readability. The study yielded 410 results in total (see code below). As an example, if a customer bought liquor and red/blush wine, there is a 90% chance (with 11.2 Lift) that bottled beer will be bought too.

Rules in graph format. The redder the circle, the higher the lift.




RCode:

> library(pacman)
> pacman::p_load(arules, arulesViz, RColorBrewer)
> #Load and transform data to right format
> Groceries = read.transactions(file.choose(), format="basket", sep=",")
> str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 29 88 118 132 33 157 167 166 38 91 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 1 variable:
.. ..$ labels: chr [1:169] "abrasive cleaner" "artif. sweetener" "baby cosmetics" "baby food" ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
> summary(Groceries)
transactions as itemMatrix in sparse format with
9835 rows (elements/itemsets/transactions) and
169 columns (items) and a density of 0.02609146
most frequent items:
whole milk other vegetables rolls/buns soda yogurt (Other)
2513 1903 1809 1715 1372 34055
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46 29 14 14 9 11 4 6
24 26 27 28 29 32
1 1 1 1 3 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 4.409 6.000 32.000
includes extended item information - examples:
labels
1 abrasive cleaner
2 artif. sweetener
3 baby cosmetics
> head(Groceries)
transactions in sparse format with
6 transactions (rows) and
169 items (columns)
> inspect(Groceries[1])
items
[1] {citrus fruit,margarine,ready soups,semi-finished bread}
> LIST(Groceries[1])
[[1]]
[1] "citrus fruit" "margarine" "ready soups" "semi-finished bread"
> #Which items were frequently bought
> #Which items were frequently bought
> itemFrequencyPlot(Groceries, topN = 20, xlab = "Items")
> #We use 0.25 as guide for establishing minsup
> rules <-
+ apriori(Groceries, parameter = list(supp = 0.001, conf = 0.8))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext
0.8 0.1 1 none FALSE TRUE 5 0.001 1 10 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 9
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
sorting and recoding items ... [157 item(s)] done [0.00s].
creating transaction tree ... done [0.01s].
checking subsets of size 1 2 3 4 5 6 done [0.05s].
writing ... [410 rule(s)] done [0.00s].
creating S4 object ... done [0.01s].
> #View the rules generated
> inspect(head(sort(rules, by = "lift"), 20))
lhs rhs support confidence lift
[1] {liquor,
red/blush wine} => {bottled beer} 0.0019 0.90 11.2
[2] {citrus fruit,
fruit/vegetable juice,
other vegetables,
soda} => {root vegetables} 0.0010 0.91 8.3
[3] {oil,
other vegetables,
tropical fruit,
whole milk,
yogurt} => {root vegetables} 0.0010 0.91 8.3
[4] {citrus fruit,
fruit/vegetable juice,
grapes} => {tropical fruit} 0.0011 0.85 8.1
[5] {other vegetables,
rice,
whole milk,
yogurt} => {root vegetables} 0.0013 0.87 8.0
[6] {oil,
other vegetables,
tropical fruit,
whole milk} => {root vegetables} 0.0013 0.87 8.0
[7] {ham,
other vegetables,
pip fruit,
yogurt} => {tropical fruit} 0.0010 0.83 7.9
[8] {beef,
citrus fruit,
other vegetables,
tropical fruit} => {root vegetables} 0.0010 0.83 7.6
[9] {butter,
cream cheese,
root vegetables} => {yogurt} 0.0010 0.91 6.5
[10] {butter,
sliced cheese,
tropical fruit,
whole milk} => {yogurt} 0.0010 0.91 6.5
[11] {cream cheese,
curd,
other vegetables,
whipped/sour cream} => {yogurt} 0.0010 0.91 6.5
[12] {butter,
other vegetables,
tropical fruit,
white bread} => {yogurt} 0.0010 0.91 6.5
[13] {pip fruit,
sausage,
sliced cheese} => {yogurt} 0.0012 0.86 6.1
[14] {butter,
curd,
tropical fruit,
whole milk} => {yogurt} 0.0012 0.86 6.1
[15] {butter,
tropical fruit,
white bread} => {yogurt} 0.0011 0.85 6.1
[16] {butter,
margarine,
tropical fruit} => {yogurt} 0.0011 0.85 6.1
[17] {cream cheese,
curd,
whipped/sour cream,
whole milk} => {yogurt} 0.0011 0.85 6.1
[18] {cream cheese,
margarine,
whipped/sour cream} => {yogurt} 0.0010 0.83 6.0
[19] {beef,
butter,
tropical fruit} => {yogurt} 0.0010 0.83 6.0
[20] {fruit/vegetable juice,
pork,
tropical fruit} => {yogurt} 0.0010 0.83 6.0
> #Alternative way of viewing rules
> options(digits = 2)
> inspect(rules[1:20])
lhs rhs support confidence lift
[1] {liquor,red/blush wine} => {bottled beer} 0.0019 0.90 11.2
[2] {cereals,curd} => {whole milk} 0.0010 0.91 3.6
[3] {cereals,yogurt} => {whole milk} 0.0017 0.81 3.2
[4] {butter,jam} => {whole milk} 0.0010 0.83 3.3
[5] {bottled beer,soups} => {whole milk} 0.0011 0.92 3.6
[6] {house keeping products,napkins} => {whole milk} 0.0013 0.81 3.2
[7] {house keeping products,whipped/sour cream} => {whole milk} 0.0012 0.92 3.6
[8] {pastry,sweet spreads} => {whole milk} 0.0010 0.91 3.6
[9] {curd,turkey} => {other vegetables} 0.0012 0.80 4.1
[10] {rice,sugar} => {whole milk} 0.0012 1.00 3.9
[11] {butter,rice} => {whole milk} 0.0015 0.83 3.3
[12] {domestic eggs,rice} => {whole milk} 0.0011 0.85 3.3
[13] {bottled water,rice} => {whole milk} 0.0012 0.92 3.6
[14] {rice,yogurt} => {other vegetables} 0.0019 0.83 4.3
[15] {mustard,oil} => {whole milk} 0.0012 0.86 3.4
[16] {canned fish,hygiene articles} => {whole milk} 0.0011 1.00 3.9
[17] {fruit/vegetable juice,herbs} => {other vegetables} 0.0012 0.80 4.1
[18] {herbs,shopping bags} => {other vegetables} 0.0019 0.83 4.3
[19] {herbs,tropical fruit} => {whole milk} 0.0023 0.82 3.2
[20] {herbs,rolls/buns} => {whole milk} 0.0024 0.80 3.1
> rules <- sort(rules, by = "Lift", decreasing = TRUE)
Error in .local(x, decreasing, ...) :
Unknown interest measure to sort by.
> rules <- sort(rules, by = "confidence", decreasing = TRUE)
> inspect(rules[1:20])
lhs rhs support confidence
[1] {rice,sugar} => {whole milk} 0.0012 1
[2] {canned fish,hygiene articles} => {whole milk} 0.0011 1
[3] {butter,rice,root vegetables} => {whole milk} 0.0010 1
[4] {flour,root vegetables,whipped/sour cream} => {whole milk} 0.0017 1
[5] {butter,domestic eggs,soft cheese} => {whole milk} 0.0010 1
[6] {citrus fruit,root vegetables,soft cheese} => {other vegetables} 0.0010 1
[7] {butter,hygiene articles,pip fruit} => {whole milk} 0.0010 1
[8] {hygiene articles,root vegetables,whipped/sour cream} => {whole milk} 0.0010 1
[9] {hygiene articles,pip fruit,root vegetables} => {whole milk} 0.0010 1
[10] {cream cheese,domestic eggs,sugar} => {whole milk} 0.0011 1
[11] {curd,domestic eggs,sugar} => {whole milk} 0.0010 1
[12] {cream cheese,domestic eggs,napkins} => {whole milk} 0.0011 1
[13] {brown bread,pip fruit,whipped/sour cream} => {other vegetables} 0.0011 1
[14] {grapes,tropical fruit,whole milk,yogurt} => {other vegetables} 0.0010 1
[15] {ham,pip fruit,tropical fruit,yogurt} => {other vegetables} 0.0010 1
[16] {ham,pip fruit,tropical fruit,whole milk} => {other vegetables} 0.0011 1
[17] {oil,root vegetables,tropical fruit,yogurt} => {whole milk} 0.0011 1
[18] {oil,other vegetables,root vegetables,yogurt} => {whole milk} 0.0014 1
[19] {butter,other vegetables,root vegetables,white bread} => {whole milk} 0.0010 1
[20] {butter,other vegetables,pork,whipped/sour cream} => {whole milk} 0.0010 1
lift
[1] 3.9
[2] 3.9
[3] 3.9
[4] 3.9
[5] 3.9
[6] 5.2
[7] 3.9
[8] 3.9
[9] 3.9
[10] 3.9
[11] 3.9
[12] 3.9
[13] 5.2
[14] 5.2
[15] 5.2
[16] 5.2
[17] 3.9
[18] 3.9
[19] 3.9
[20] 3.9
> rules <- sort(rules, by = "lift", decreasing = TRUE)
> inspect(rules[1:20])
lhs rhs support confidence lift
[1] {liquor,
red/blush wine} => {bottled beer} 0.0019 0.90 11.2
[2] {citrus fruit,
fruit/vegetable juice,
other vegetables,
soda} => {root vegetables} 0.0010 0.91 8.3
[3] {oil,
other vegetables,
tropical fruit,
whole milk,
yogurt} => {root vegetables} 0.0010 0.91 8.3
[4] {citrus fruit,
fruit/vegetable juice,
grapes} => {tropical fruit} 0.0011 0.85 8.1
[5] {other vegetables,
rice,
whole milk,
yogurt} => {root vegetables} 0.0013 0.87 8.0
[6] {oil,
other vegetables,
tropical fruit,
whole milk} => {root vegetables} 0.0013 0.87 8.0
[7] {ham,
other vegetables,
pip fruit,
yogurt} => {tropical fruit} 0.0010 0.83 7.9
[8] {beef,
citrus fruit,
other vegetables,
tropical fruit} => {root vegetables} 0.0010 0.83 7.6
[9] {butter,
cream cheese,
root vegetables} => {yogurt} 0.0010 0.91 6.5
[10] {butter,
sliced cheese,
tropical fruit,
whole milk} => {yogurt} 0.0010 0.91 6.5
[11] {cream cheese,
curd,
other vegetables,
whipped/sour cream} => {yogurt} 0.0010 0.91 6.5
[12] {butter,
other vegetables,
tropical fruit,
white bread} => {yogurt} 0.0010 0.91 6.5
[13] {pip fruit,
sausage,
sliced cheese} => {yogurt} 0.0012 0.86 6.1
[14] {butter,
curd,
tropical fruit,
whole milk} => {yogurt} 0.0012 0.86 6.1
[15] {butter,
tropical fruit,
white bread} => {yogurt} 0.0011 0.85 6.1
[16] {butter,
margarine,
tropical fruit} => {yogurt} 0.0011 0.85 6.1
[17] {cream cheese,
curd,
whipped/sour cream,
whole milk} => {yogurt} 0.0011 0.85 6.1
[18] {cream cheese,
margarine,
whipped/sour cream} => {yogurt} 0.0010 0.83 6.0
[19] {beef,
butter,
tropical fruit} => {yogurt} 0.0010 0.83 6.0
[20] {fruit/vegetable juice,
pork,
tropical fruit} => {yogurt} 0.0010 0.83 6.0
> plot(rules)
> plot(rules, control = list(col = brewer.pal(11, "Spectral")), main = "", interactive = TRUE)
Interactive mode.
Select a region with two clicks!
Number of rules selected: 2
lhs rhs support confidence lift order
[1] {liquor,
red/blush wine} => {bottled beer} 0.0019 0.9 11.2 3
[2] {fruit/vegetable juice,
tropical fruit,
whipped/sour cream} => {other vegetables} 0.0019 0.9 4.7 4
> plot(rules[1:20], method = "graph", control = list(type = "items"))
> #Make an interactive graph
> plot(rules[1:20],
+ method = "graph",
+ interactive = TRUE,
+ shading = T)
> plot(rules[1:20],
+ method = "paracoord",
+ control = list(reorder = TRUE))
> plot(rules[1:20],
+ method = "paracoord", interactive = TRUE,
+ control = list(reorder = TRUE))
> plot(rules[1:20], method = "matrix", control = list(reorder = TRUE))
Itemsets in Antecedent (LHS)
[1] "{herbs,rolls/buns}" "{herbs,tropical fruit}"
[3] "{cereals,yogurt}" "{butter,rice}"
[5] "{house keeping products,napkins}" "{house keeping products,whipped/sour cream}"
[7] "{mustard,oil}" "{rice,sugar}"
[9] "{bottled water,rice}" "{bottled beer,soups}"
[11] "{canned fish,hygiene articles}" "{domestic eggs,rice}"
[13] "{pastry,sweet spreads}" "{cereals,curd}"
[15] "{butter,jam}" "{fruit/vegetable juice,herbs}"
[17] "{curd,turkey}" "{herbs,shopping bags}"
[19] "{rice,yogurt}" "{liquor,red/blush wine}"
Itemsets in Consequent (RHS)
[1] "{whole milk}" "{bottled beer}" "{other vegetables}"
> plot(rules[1:20], method = "grouped")
#What items are usually bought before whole milk?
rules
rules<-apriori(data=Groceries, parameter=list(supp=0.001,conf = 0.08),
appearance = list(default="lhs",rhs="whole milk"),
control = list(verbose=F))
rules<-sort(rules, decreasing=TRUE,by="confidence")
inspect(rules[1:20])
lhs rhs support confidence lift
[1] {rice,sugar} => {whole milk} 0.0012 1 3.9
[2] {canned fish,hygiene articles} => {whole milk} 0.0011 1 3.9
[3] {butter,rice,root vegetables} => {whole milk} 0.0010 1 3.9
[4] {flour,root vegetables,whipped/sour cream} => {whole milk} 0.0017 1 3.9
[5] {butter,domestic eggs,soft cheese} => {whole milk} 0.0010 1 3.9
[6] {butter,hygiene articles,pip fruit} => {whole milk} 0.0010 1 3.9
[7] {hygiene articles,root vegetables,whipped/sour cream} => {whole milk} 0.0010 1 3.9
[8] {hygiene articles,pip fruit,root vegetables} => {whole milk} 0.0010 1 3.9
[9] {cream cheese,domestic eggs,sugar} => {whole milk} 0.0011 1 3.9
[10] {curd,domestic eggs,sugar} => {whole milk} 0.0010 1 3.9
[11] {cream cheese,domestic eggs,napkins} => {whole milk} 0.0011 1 3.9
[12] {oil,root vegetables,tropical fruit,yogurt} => {whole milk} 0.0011 1 3.9
[13] {oil,other vegetables,root vegetables,yogurt} => {whole milk} 0.0014 1 3.9
[14] {butter,other vegetables,root vegetables,white bread} => {whole milk} 0.0010 1 3.9
[15] {butter,other vegetables,pork,whipped/sour cream} => {whole milk} 0.0010 1 3.9
[16] {butter,domestic eggs,other vegetables,whipped/sour cream} => {whole milk} 0.0012 1 3.9
[17] {citrus fruit,pastry,rolls/buns,whipped/sour cream} => {whole milk} 0.0010 1 3.9
[18] {bottled water,other vegetables,pip fruit,root vegetables} => {whole milk} 0.0011 1 3.9
[19] {rolls/buns,root vegetables,sausage,tropical fruit} => {whole milk} 0.0010 1 3.9
[20] {oil,other vegetables,root vegetables,tropical fruit,yogurt} => {whole milk} 0.0010 1 3.9
> #What items are usually bought if they purchase whole milk?
> #We adjust confidence to 0.15 since 0.8 yielded no results
> #We set a length of 2 to discard empty left side items
> rules<-apriori(data=Groceries, parameter=list(supp=0.001,conf = 0.15,minlen=2),
+ appearance = list(default="rhs",lhs="whole milk"),
+ control = list(verbose=F))
> rules<-sort(rules, decreasing=TRUE,by="confidence")
> #Apparently, there are only 6 itemsets
> inspect(rules[1:6])
lhs rhs support confidence lift
[1] {whole milk} => {other vegetables} 0.075 0.29 1.5
[2] {whole milk} => {rolls/buns} 0.057 0.22 1.2
[3] {whole milk} => {yogurt} 0.056 0.22 1.6
[4] {whole milk} => {root vegetables} 0.049 0.19 1.8
[5] {whole milk} => {tropical fruit} 0.042 0.17 1.6
[6] {whole milk} => {soda} 0.040 0.16 0.9
> rules
set of 6 rules
>

Comments

  1. Online Casino in NJ - Atlantic City - Ambienshoppie
    The online casino is an open-to-play casino where players can enjoy casino games for free and without any wagering requirements. The online casino is 온라인 카지노 처벌

    ReplyDelete

Post a Comment