470643 c08.qxd 3/8/04 11:14 AM Page 278
278 Chapter 8 
Furthermore, there is a general pattern of zip codes increasing from East to 
West. Codes that start with 0 are in New England and Puerto Rico; those 
beginning with 9 are on the west coast. This suggests a distance function that 
approximates geographic distance by looking at the high order digits of the 
zip code. 
■■ d
zip
(A,B) = 0.0 if the zip codes are identical 
■■ d
zip
(A,B) = 0.1 if the first three digits are identical (e.g., “20008” and 
“20015” 
■■ d
zip
(A,B) = 0.5 if the first digits are identical (e.g., “95050” and “98125”) 
■■ d
zip
(A,B) = 1.0 if the first digits are not identical (e.g., “02138” and 
“94704”) 
Of course, if geographic distance were truly of interest, a better approach 
would be to look up the latitude and longitude of each zip code in a table and 
calculate the distances that way (it is possible to get this information for the 
United States from www.census.gov). For many purposes however, geographic 
proximity is not nearly as important as some other measure of similarity. 10011 
and 10031 are both in Manhattan, but from a marketing point of view, they 
don’t have much else in common, because one is an upscale downtown neigh-
borhood and the other is a working class Harlem neighborhood. On the other 
hand 02138 and 94704 are on opposite coasts, but are likely to respond very 
similarly to direct mail from a political action committee, since they are for 
Cambridge, MA and Berkeley, CA respectively. 
This is just one example of how the choice of a distance metric depends on 
the data mining context. There are additional examples of distance and simi-
larity measures in Chapter 11 where they are applied to clustering. 
When a Distance Metric Already Exists 
There are some situations where a distance metric already exists, but is diffi-
cult to spot. These situations generally arise in one of two forms. Sometimes, a 
function already exists that provides a distance measure that can be adapted 
for use in MBR. The news story case study provides a good example of adapt-
ing an existing function, the relevance feedback score, for use as a distance 
function. 
Other times, there are fields that do not appear to capture distance, but can 
be pressed into service. An example of such a hidden distance field is solicita-
tion history. Two customers who were chosen for a particular solicitation in 
the past are “close,” even though the reasons why they were chosen may no 
longer be available; two who were not chosen, are close, but not as close; and 
one that was chosen and one that was not are far apart. The advantage of this 
metric is that it can incorporate previous decisions, even if the basis for the 
470643 c08.qxd 3/8/04 11:14 AM Page 279
Memory-Based Reasoning and Collaborative Filtering 279 
decisions is no longer available. On the other hand, it does not work well for 
customers who were not around during the original solicitation; so some sort 
of neutral weighting must be applied to them. 
Considering whether the original customers responded to the solicitation 
can extend this function further, resulting in a solicitation metric like: 
■■ d
solicitation
(A, B) = 0, when A and B both responded to the solicitation 
■■ d
solicitation
(A, B) = 0.1, when A and B were both chosen but neither 
responded 
■■ d
solicitation
(A, B) = 0.2, when neither A nor B was chosen, but both were 
available in the data 
■■ d
solicitation
(A, B) = 0.3, when A and B were both chosen, but only one 
responded 
■■ d
solicitation
(A, B) = 0.3, when one or both were not considered 
■■ d
solicitation
(A, B) = 1.0, when one was chosen and the other was not 
Of course, the particular values are not sacrosanct; they are only meant as a 
guide for measuring similarity and showing how previous information and 
response histories can be incorporated into a distance function. 
The Combination Function: Asking the 
Neighbors for the Answer 
The distance function is used to determine which records comprise the neigh-
borhood. This section presents different ways to combine data gathered from 
those neighbors to make a prediction. At the beginning of this chapter, we 
estimated the median rent in the town of Tuxedo, by taking an average 
of the median rents in similar towns. In that example, averaging was the 
combination function. This section explores other methods of canvassing the 
neighborhood. 
The Basic Approach: Democracy 
One common combination function is for the k nearest neighbors to vote on an 
answer—”democracy” in data mining. When MBR is used for classification, 
each neighbor casts its vote for its own class. The proportion of votes for each 
class is an estimate of the probability that the new record belongs to the corre-
sponding class. When the task is to assign a single class, it is simply the one 
with the most votes. When there are only two categories, an odd number of 
neighbors should be poled to avoid ties. As a rule of thumb, use c+1 neighbors 
when there are c categories to ensure that at least one class has a plurality. 
470643 c08.qxd 3/8/04 11:14 AM Page 280
280 Chapter 8 
In Table 8.12, the five test cases seen earlier have been augmented with a flag 
that signals whether the customer has become inactive. 
For this example, three of the customers have become inactive and two have 
not, an almost balanced training set. For illustrative purposes, let’s try to deter-
mine if the new record is active or inactive by using different values of k for 
two distance functions, deuclid and dnorm (Table 8.13). 
The question marks indicate that no prediction has been made due to a tie 
among the neighbors. Notice that different values of k do affect the classifica-
tion. This suggests using the percentage of neighbors in agreement to provide 
the level of confidence in the prediction (Table 8.14). 
Table 8.12 Customers with Attrition History 
RECNUM GENDER AGE SALARY INACTIVE 
1 female 27 $19,000 no 
2 male 51 $64,000 yes 
3 male 52 $105,000 yes 
4 female 33 $55,000 yes 
5 male 45 $45,000 no 
new female 45 $100,000 ? 
Table 8.13 Using MBR to Determine if the New Customer Will Become Inactive 
NEIGHBOR 
NEIGHBORS ATTRITION K = 1 K = 2 K = 3 K = 4 K = 5 
d
sum 
4,3,5,2,1 Y,Y,N,Y,N yes yes yes yes yes 
d
Euclid 
4,1,5,2,3 Y,N,N,Y,Y yes ? no ? yes 
Table 8.14 Attrition Prediction with Confidence 
K = 1 K = 2 K = 3 K = 4 K = 5 
d
sum 
yes, 100% yes, 100% yes, 67% yes, 75% yes, 60% 
d
Euclid 
yes, 100% yes, 50% no, 67% yes, 50% yes, 60% 
470643 c08.qxd 3/8/04 11:14 AM Page 281
Memory-Based Reasoning and Collaborative Filtering 281 
The confidence level works just as well when there are more than two cate-
gories. However, with more categories, there is a greater chance that no single 
category will have a majority vote. One of the key assumptions about MBR 
(and data mining in general) is that the training set provides sufficient infor-
mation for predictive purposes. If the neighborhoods of new cases consistently 
produce no obvious choice of classification, then the data simply may not con-
tain the necessary information and the choice of dimensions and possibly of 
the training set needs to be reevaluated. By measuring the effectiveness of 
MBR on the test set, you can determine whether the training set has a sufficient 
number of examples. 
WARNING MBR is only as good as the training set it uses. To measure 
whether the training set is effective, measure the results of its predictions on 
the test set using two, three, and four neighbors. If the results are inconclusive 
or inaccurate, then the training set is not large enough or the dimensions and 
distance metrics chosen are not appropriate. 
Weighted Voting 
Weighted voting is similar to voting in the previous section except that the 
neighbors are not all created equal—more like shareholder democracy than 
one-person, one-vote. The size of the vote is inversely proportional to the dis-
tance from the new record, so closer neighbors have stronger votes than neigh-
bors farther away do. To prevent problems when the distance might be 0, it is 
common to add 1 to the distance before taking the inverse. Adding 1 also 
makes all the votes between 0 and 1. 
Table 8.15 applies weighted voting to the previous example. The “yes, cus-
tomer will become inactive” vote is the first; the “no, this is a good customer” 
vote is second. 
Weighted voting has introduced enough variation to prevent ties. The con-
fidence level can now be calculated as the ratio of winning votes to total votes 
(Table 8.16). 
Table 8.15 Attrition Prediction with Weighted Voting 
K = 1 K = 2 K = 3 K = 4 K = 5 
d
sum 
0.749 to 0 1.441 to 0 1.441 2.085 to 2.085 to 
to 0.647 0.647 1.290 
d
Euclid 
0.669 to 0 0.669 to 0.669 to 1.157 to 1.601 to 
0.562 1.062 1.062 1.062 
470643 c08.qxd 3/8/04 11:14 AM Page 282
282 Chapter 8 
Table 8.16 Confidence with Weighted Voting 
1 2 3 4 5 
d
sum 
yes, 100% yes, 100% yes, 69% yes, 76% yes, 62% 
d
Euclid 
yes, 100% yes, 54% no, 61% yes, 52% yes, 60% 
In this case, weighting the votes has only a small effect on the results and the 
confidence. The effect of weighting is largest when some neighbors are con-
siderably further away than others. 
Weighting can also be applied to estimation by replacing the simple average 
of neighboring values with an average weighted by distance. This approach is 
used in collaborative filtering systems, as described in the following section. 
Collaborative Filtering: A Nearest Neighbor 
Approach to Making Recommendations 
Neither of the authors considers himself a country music fan, but one of them 
is the proud owner of an autographed copy of an early Dixie Chicks CD. The 
Chicks, who did not yet have a major record label, were performing in a local 
bar one day and some friends who knew them from Texas made a very enthu-
siastic recommendation. The performance was truly memorable, featuring 
Martie Erwin’s impeccable Bluegrass fiddle, her sister Emily on a bewildering 
variety of other instruments (most, but not all, with strings), and the seductive 
vocals of Laura Lynch (who also played a stand-up electric bass). At the break, 
the band sold and autographed a self-produced CD that we still like better 
than the one that later won them a Grammy. What does this have to do with 
nearest neighbor techniques? Well, it is a human example of collaborative fil-
tering. A recommendation from trusted friends will cause one to try something 
one otherwise might not try. 
Collaborative filtering is a variant of memory-based reasoning particularly 
well suited to the application of providing personalized recommendations. A 
collaborative filtering system starts with a history of people’s preferences. The 
distance function determines similarity based on overlap of preferences— 
people who like the same thing are close. In addition, votes are weighted by 
distances, so the votes of closer neighbors count more for the recommenda-
tion. In other words, it is a technique for finding music, books, wine, or any-
thing else that fits into the existing preferences of a particular person by using 
the judgments of a peer group selected for their similar tastes. This approach 
is also called social information filtering. 
TEAMFLY                       
                              Team-Fly
® 
470643 c08.qxd 3/8/04 11:14 AM Page 283
Memory-Based Reasoning and Collaborative Filtering 283 
Collaborative filtering automates the process of using word-of-mouth to 
decide whether they would like something. Knowing that lots of people liked 
something is not enough. Who liked it is also important. Everyone values some 
recommendations more highly than others. The recommendation of a close 
friend whose past recommendations have been right on target may be enough 
to get you to go see a new movie even if it is in a genre you generally dislike. 
On the other hand, an enthusiastic recommendation from a friend who thinks 
Ace Ventura: Pet Detective is the funniest movie ever made might serve to warn 
you off one you might otherwise have gone to see. 
Preparing recommendations for a new customer using an automated col-
laborative filtering system has three steps: 
1. Building a customer profile by getting the new customer to rate a selec-
tion of items such as movies, songs, or restaurants. 
2. Comparing the new customer’s profile with the profiles of other cus-
tomers using some measure of similarity. 
3. Using some combination of the ratings of customers with similar pro-
files to predict the rating that the new customer would give to items he 
or she has not yet rated. 
The following sections examine each of these steps in a bit more detail. 
Building Profiles 
One challenge with collaborative filtering is that there are often far more items 
to be rated than any one person is likely to have experienced or be willing to 
rate. That is, profiles are usually sparse, meaning that there is little overlap 
among the users’ preferences for making recommendations. Think of a user 
profile as a vector with one element per item in the universe of items to be 
rated. Each element of the vector represents the profile owner’s rating for the 
corresponding item on a scale of –5 to 5 with 0 indicating neutrality and null 
values for no opinion. 
If there are thousands or tens of thousands of elements in the vector and 
each customer decides which ones to rate, any two customers’ profiles are 
likely to end up with few overlaps. On the other hand, forcing customers to 
rate a particular subset may miss interesting information because ratings of 
more obscure items may say more about the customer than ratings of common 
ones. A fondness for the Beatles is less revealing than a fondness for Mose 
Allison. 
A reasonable approach is to have new customers rate a list of the twenty or 
so most frequently rated items (a list that might change over time) and then 
free them to rate as many additional items as they please. 
470643 c08.qxd 3/8/04 11:14 AM Page 284
284 Chapter 8 
Comparing Profiles 
Once a customer profile has been built, the next step is to measure its distance 
from other profiles. The most obvious approach would be to treat the profile 
vectors as geometric points and calculate the Euclidean distance between 
them, but many other distance measures have been tried. Some give higher 
weight to agreement when users give a positive rating especially when most 
users give negative ratings to most items. Still others apply statistical correla-
tion tests to the ratings vectors. 
Making Predictions 
The final step is to use some combination of nearby profiles in order to come 
up with estimated ratings for the items that the customer has not rated. One 
approach is to take a weighted average where the weight is inversely propor-
tional to the distance. The example shown in Figure 8.7 illustrates estimating 
the rating that Nathaniel would give to Planet of the Apes based on the opinions 
of his neighbors, Simon and Amelia. 
Nathaniel 
Alan 
Michael 
Stephanie 
Amelia 
Simon 
Crouching Tiger 
–1 
Osmosis Jones 
. 
. 
. 
Crouching Tiger 
–4 
Osmosis Jones 
. 
. 
. 
P eter 
Jenn y 
Apocalypse Now 
Vertical Ray of Sun 
Planet Of The Apes 
American Pie 2 
Plan 9 From Outer Space 
Apocalypse Now 
Vertical Ray of Sun 
Planet Of The Apes 
American Pie 2 
Plan 9 From Outer Space 
Figure 8.7 The predicted rating for Planet of the Apes is –2.66. 
470643 c08.qxd 3/8/04 11:14 AM Page 285
Memory-Based Reasoning and Collaborative Filtering 285 
Simon, who is distance 2 away, gave that movie a rating of –1. Amelia, who 
is distance 4 away, gave that movie a rating of –4. No one else’s profile is close 
enough to Nathaniel’s to be included in the vote. Because Amelia is twice as 
far away as Simon, her vote counts only half as much as his. The estimate for 
Nathaniel’s rating is weighted by the distance: 
(
1
⁄2 (–1) + 
1
⁄4 (–4)) / (
1
⁄2 +
1
⁄4)= –1.5/0.75= –2. 
A good collaborative filtering system gives its users a chance to comment on 
the predictions and adjust the profile accordingly. In this example, if Nathaniel 
rents the video of Planet of the Apes despite the prediction that he will not like 
it, he can then enter an actual rating of his own. If it turns out that he really 
likes the movie and gives it a rating of 4, his new profile will be in a slightly 
different neighborhood and Simon’s and Amelia’s opinions will count less for 
Nathaniel’s next recommendation. 
Lessons Learned 
Memory based reasoning is a powerful data mining technique that can be used 
to solve a wide variety of data mining problems involving classification or 
estimation. Unlike other data mining techniques that use a training set of pre-
classified data to create a model and then discard the training set, for MBR, the 
training set essentially is the model. 
Choosing the right training set is perhaps the most important step in MBR. 
The training set needs to include sufficient numbers of examples all possible 
classifications. This may mean enriching it by including a disproportionate 
number of instances for rare classifications in order to create a balanced train-
ing set with roughly the same number of instances for all categories. A training 
set that includes only instances of bad customers will predict that all cus-
tomers are bad. In general, the size of the training set should have at least thou-
sands, if not hundreds of thousands or millions, of examples. 
MBR is a k-nearest neighbors approach. Determining which neighbors are 
near requires a distance function. There are many approaches to measuring the 
distance between two records. The careful choice of an appropriate distance 
function is a critical step in using MBR. The chapter introduced an approach to 
creating an overall distance function by building a distance function for each 
field and normalizing it. The normalized field distances can then be combined 
in a Euclidean fashion or summed to produce a Manhattan distance. 
When the Euclidean method is used, a large difference in any one field is 
enough to cause two records to be considered far apart. The Manhattan method 
is more forgiving—a large difference on one field can more easily be offset by 
close values on other fields. A validation set can be used to pick the best dis-
tance function for a given model set by applying all candidates to see which 
470643 c08.qxd 3/8/04 11:14 AM Page 286
286 Chapter 8 
produces better results. Sometimes, the right choice of neighbors depends on 
modifying the distance function to favor some fields over others. This is easily 
accomplished by incorporating weights into the distance function. 
The next question is the number of neighbors to choose. Once again, inves-
tigating different numbers of neighbors using the validation set can help 
determine the optimal number. There is no right number of neighbors. The 
number depends on the distribution of the data and is highly dependent on 
the problem being solved. 
The basic combination function, weighted voting, does a good job for cate-
gorical data, using weights inversely proportional to distance. The analogous 
operation for estimating numeric values is a weighted average. 
One good application for memory based reasoning is making recommenda-
tions. Collaborative filtering is an approach to making recommendations that 
works by grouping people with similar tastes together using a distance func-
tion that can compare two lists user-supplied ratings. Recommendations for a 
new person are calculated using a weighted average of the ratings of his or her 
nearest neighbors. 
470643 c09.qxd 3/8/04 11:15 AM Page 287
Market Basket Analysis 
and Association Rules 
9 
CHAPTER 
To convey the fundamental ideas of market basket analysis, start with the 
image of the shopping cart in Figure 9.1 filled with various products pur-
chased by someone on a quick trip to the supermarket. This basket contains an 
assortment of products—orange juice, bananas, soft drink, window cleaner, 
and detergent. One basket tells us about what one customer purchased at one 
time. A complete list of purchases made by all customers provides much more 
information; it describes the most important part of a retailing business—what 
merchandise customers are buying and when. 
Each customer purchases a different set of products, in different quantities, 
at different times. Market basket analysis uses the information about what cus-
tomers purchase to provide insight into who they are and why they make cer-
tain purchases. Market basket analysis provides insight into the merchandise 
by telling us which products tend to be purchased together and which are 
most amenable to promotion. This information is actionable: it can suggest 
new store layouts; it can determine which products to put on special; it can 
indicate when to issue coupons, and so on. When this data can be tied to indi-
vidual customers through a loyalty card or Web site registration, it becomes 
even more valuable. 
The data mining technique most closely allied with market basket analysis 
is the automatic generation of association rules. Association rules represent 
patterns in the data without a specified target. As such, they are an example of 
undirected data mining. Whether the patterns make sense is left to human 
interpretation. 
287 
470643 c09.qxd 3/8/04 11:15 AM Page 288
288 Chapter 9 
In this shopping basket, the shopper purchased 
a quart of orange juice, some bananas, dish 
detergent, some window cleaner, and a six 
pack of soda. 
Is soda typically purchased with 
bananas? Does the brand of soda
demographics of the 
What should be in the 
make a difference? 
How do the 
neighborhood affect 
what customers buy? 
basket but is not? 
Are window cleaning products 
purchased when detergent and orange 
juice are bought together? 
Figure 9.1 Market basket analysis helps you understand customers as well as items that 
are purchased together. 
Association rules were originally derived from point-of-sale data that 
describes what products are purchased together. Although its roots are in ana-
lyzing point-of-sale transactions, association rules can be applied outside the 
retail industry to find relationships among other types of “baskets.” Some 
examples of potential applications are: 
■■ Items purchased on a credit card, such as rental cars and hotel rooms, 
provide insight into the next product that customers are likely to 
purchase. 
■■ Optional services purchased by telecommunications customers (call 
waiting, call forwarding, DSL, speed call, and so on) help determine 
how to bundle these services together to maximize revenue. 
■■ Banking services used by retail customers (money market accounts, 
CDs, investment services, car loans, and so on) identify customers 
likely to want other services. 
■■ Unusual combinations of insurance claims can be a sign of fraud and 
can spark further investigation. 
■■ Medical patient histories can give indications of likely complications 
based on certain combinations of treatments. 
Association rules often fail to live up to expectations. In our experience, 
for instance, they are not a good choice for building cross-selling models in 
470643 c09.qxd 3/8/04 11:15 AM Page 289
Market Basket Analysis and Association Rules 289 
industries such as retail banking, because the rules end up describing previous 
marketing promotions. Also, in retail banking, customers typically start with a 
checking account and then a savings account. Differentiation among products 
does not appear until customers have more products. This chapter covers the 
pitfalls as well as the uses of association rules. 
The chapter starts with an overview of market basket analysis, including 
more basic analyses of market basket data that do not require association rules. 
It then dives into association rules, explaining how they are derived. The chap-
ter then continues with ways to extend association rules to include other facets 
of the market basket analysis. 
Defining Market Basket Analysis 
Market basket analysis does not refer to a single technique; it refers to a set of 
business problems related to understanding point-of-sale transaction data. 
The most common technique is association rules, and much of this chapter 
delves into that subject. Before talking about association rules, this section 
talks about market basket data. 
Three Levels of Market Basket Data 
Market basket data is transaction data that describes three fundamentally 
different entities: 
■■ Customers 
■■ Orders (also called purchases or baskets or, in academic papers, item sets) 
■■ Items 
In a relational database, the data structure for market basket data often 
looks similar to Figure 9.2. This data structure includes four important entities. 
LINE ITEM 
ORDER ID 
UNIT PRICE 
UNIT COST 
etc. 
ORDER 
ORDER ID 
SHIPPING COST 
etc. 
NAME 
ADDRESS 
etc. 
DESCRIPTION 
etc. 
LINE ITEM ID 
PRODUCT ID 
QUANTITY 
GIFT WRAP FLAG 
TAXABLE FLAG 
CUSTOMER ID 
ORDER DATE 
PAYMENT TYPE 
TOTAL VALUE 
SHIP DATE 
CUSTOMER 
CUSTOMER ID 
PRODUCT 
PRODUCT ID 
CATEGORY 
SUBCATEGORY 
Figure 9.2 A data model for transaction-level market basket data typically has three 
tables, one for the customer, one for the order, and one for the order line. 
470643 c09.qxd 3/8/04 11:15 AM Page 290
290 Chapter 9 
The order is the fundamental data structure for market basket data. An 
order represents a single purchase event by a customer. This might correspond 
to a customer ordering several products on a Web site or to a customer pur-
chasing a basket of groceries or to a customer buying a several items from a 
catalog. This includes the total amount of the purchase, the total amount, addi-
tional shipping charges, payment type, and whatever other data is relevant 
about the transaction. Sometimes the transaction is given a unique identifier. 
Sometimes the unique identifier needs to be cobbled together from other data. 
In one example, we needed to combine four fields to get an identifier for pur-
chases in a store—the timestamp when the customer paid, chain ID, store ID, 
and lane ID. 
Individual items in the order are represented separately as line items. This 
data includes the price paid for the item, the number of items, whether tax 
should be charged, and perhaps the cost (which can be used for calculating 
margin). The item table also typically has a link to a product reference table, 
which provides more descriptive information about each product. This descrip-
tive information should include the product hierarchy and other information 
that might prove valuable for analysis. 
The customer table is an optional table and should be available when a cus-
tomer can be identified, for example, on a Web site that requires registration or 
when the customer uses an affinity card during the transaction. Although the 
customer table may have interesting fields, the most powerful element is the 
ID itself, because this can tie transactions together over time. 
Tracking customers over time makes it possible to determine, for instance, 
which grocery shoppers “bake from scratch”—something of keen interest to 
the makers of flour as well as prepackaged cake mixes. Such customers might 
be identified from the frequency of their purchases of flour, baking powder, 
and similar ingredients, the proportion of such purchases to the customer’s 
total spending, and the lack of interest in prepackaged mixes and ready-to-eat 
desserts. Of course, such ingredients may be purchased at different times and 
in different quantities, making it necessary to tie together multiple transac-
tions over time. 
All three levels of market basket data are important. For instance, to under-
stand orders, there are some basic measures: 
■■ What is the average number of orders per customer? 
■■ What is the average number of unique items per order? 
■■ What is the average number of items per order? 
■■ For a given product, what is the proportion of customers who have ever 
purchased the product? 
470643 c09.qxd 3/8/04 11:15 AM Page 291
Market Basket Analysis and Association Rules 291 
■■ For a given product, what is the average number of orders per cus-
tomer that include the item? 
■■ For a given product, what is the average quantity purchased in an order 
when the product is purchased? 
These measures give broad insight into the business. In some cases, there are 
few repeat customers, so the proportion of orders per customer is close to 1; 
this suggests a business opportunity to increase the number of sales per cus-
tomers. Or, the number of products per order may be close to 1, suggesting an 
opportunity for cross-selling during the process of making an order. 
It can be useful to compare these measures to each other. We have found that 
the number of orders is often a useful way of differentiating among customers; 
good customers clearly order more often than not-so-good customers. Figure 
9.3 attempts to look at the breadth of the customer relationship (the number of 
unique items ever purchased) by the depth of the relationship (the number of 
orders) for customers who purchased more than one item. This data is from a 
small specialty retailer. The biggest bubble shows that many customers who 
purchase two products do so at the same time. There is also a surprisingly 
large bubble showing that a sizeable number of customers purchase the same 
product in two orders. Better customers—at least those who returned multiple 
times—tend to purchase a greater diversity of goods. However, some of them 
are returning and buying the same thing they bought the first time. How can 
the retailer encourage customers to come back and buy more and different 
products? Market basket analysis cannot answer the question, but it can at 
least motivate asking it and perhaps provide hints that might help. 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
Num Distincts Products 
Across All Orders 
0 1 2 3 4 5 6 
Num Orders 
Figure 9.3 This bubble plot shows the breadth of customer relationships by the depth of 
the relationship. 
470643 c09.qxd 3/8/04 11:15 AM Page 292
292 Chapter 9 
Order Characteristics 
Customer purchases have additional interesting characteristics. For instance, 
the average order size varies by time and region—and it is useful to keep track 
of these to understand changes in the business environment. Such information 
is often available in reporting systems, because it is easily summarized. 
Some information, though, may need to be gleaned from transaction-level 
data. Figure 9.4 breaks down transactions by the size of the order and the credit 
card used for payment—Visa, MasterCard, or American Express—for another 
retailer. The first thing to notice is that the larger the order, the larger the average 
purchase amount, regardless of the credit card being used. This is reassuring. 
Also, the use of one credit card type, American Express, is consistently associ-
ated with larger orders—an interesting finding about these customers. 
For Web purchases and mail-order transactions, additional information may 
also be gathered at the point of sale: 
■■ Did the order use gift wrap? 
■■ Is the order going to the same address as the billing address? 
■■ Did the purchaser accept or decline a particular cross-sell offer? 
Of course, gathering information at the point of sale and having it available 
for analysis are two different things. However, gift giving and responsiveness 
to cross-sell offers are two very useful things to know about customers. Find-
ing patterns with this information requires collecting the information in the 
first place (at the call center or through the online interface) and then moving 
it to a data mining environment. 
$0 
$250 
$500 
$750 
$1,000 
$1,250 
$1,500 
MasterCard 
Visa
Average Order Amount 
American Express 
1 2 3 4 5 6 7 8 9 
Number of Items Purchased 
Figure 9.4 This chart shows the average amount spent by credit card type based on the 
number of items in the order for one particular retailer. 
TEAMFLY         
                                            Team-Fly
® 
470643 c09.qxd 3/8/04 11:15 AM Page 293
Market Basket Analysis and Association Rules 293 
Item Popularity 
What are the most popular items? This is a question that can usually be 
answered by looking at inventory curves, which can be generated without 
having to work with transaction-level data. However, knowing the sales of an 
individual item is only the beginning. There are related questions: 
■■ What is the most common item found in a one-item order? 
■■ What is the most common item found in a multi-item order? 
■■ What is the most common item found among customers who are repeat 
purchasers? 
■■ How has the popularity of particular items changed over time? 
■■ How does the popularity of an item vary regionally? 
The first three questions are particularly interesting because they may 
suggest ideas for growing customer relationships. Association rules can pro-
vide answers to these questions, particularly when used with virtual items to 
represent the size of the order or the number of orders a customer has made. 
The last two questions bring up the dimensions of time and geography, 
which are very important for applications of market basket analysis. Differ-
ent products have different affinities in different regions—something that 
retailers are very familiar with. It is also possible to use association rules to 
start to understand these areas, by introducing virtual items for region and 
seasonality. 
TIP Time and geography are two of the most important attributes of market 
basket data, because they often point to the exact marketing conditions at the 
time of the sale. 
Tracking Marketing Interventions 
As discussed in Chapter 5, looking at individual products over time can pro-
vide a good understanding of what is happening with the product. Including 
marketing interventions along with the product sales over time, as in Figure 
9.5, makes it possible to see the effect of the interventions. The chart shows a 
sales curve for a particular product. Prior to the intervention, sales are hover-
ing at 50 units per week. After the intervention, they peak at about seven or 
eight times that amount, before gently sliding down over the six or seven 
weeks. Using such charts, it can be possible to measure the response of the 
marketing effort. 
470643 c09.qxd 3/8/04 11:15 AM Page 294
294 Chapter 9 
0 
50 
100 
150 
200 
250 
300 
350 
400 
450 
Mail Drop 
Mar 01
Mar 08
Mar 15
Mar 22
Mar 29
Apr 05
Apr 12
Apr 19
Apr 26
May 03
May 10
May 17
May 24
May 31 
Jun 07
Jun 14
Jun 21
Jun 28
Jul 05 
Jul 12 
Jul 19 
Jul 26 
Aug 02 
Figure 9.5 Showing marketing interventions and product sales on the same chart makes 
it possible to see effects of marketing efforts. 
Such analysis does not require looking at individual market baskets—daily 
or weekly summaries of product sales are sufficient. However, it does require 
knowing when marketing interventions take place—and sometimes getting 
such a calendar is the biggest challenge. One of the questions that such a chart 
can answer is the effect of the intervention. A challenge in answering this ques-
tion is determining whether the additional sales are incremental or are made 
by customers who would purchase the product anyway at some later time. 
Market basket data can start to answer this question. In addition to looking 
at the volume of sales after an intervention, we can also look at the number of 
baskets containing the item. If the number of customers is not increasing, there 
is evidence that existing customers are simply stocking up on the item at a 
lower cost. 
A related question is whether discounting results in additional sales of other 
products. Association rules can help answer this question by finding combina-
tions of products that include those being promoted during the period of the 
promotion. Similarly, we might want to know if the average size of orders 
increases or decreases after an intervention. These are examples of questions 
where more detailed transaction level data is important. 
Clustering Products by Usage 
Perhaps one of the most interesting questions is what groups of products often 
appear together. Such groups of products are very useful for making recom-
mendations to customers—customers who have purchased some of the prod-
ucts may be interested in the rest of them (Chapter 8 talks about product 
470643 c09.qxd 3/8/04 11:15 AM Page 295
TEM ID
FLAG
G
TEM ID
FLAG
G
Market Basket Analysis and Association Rules 295 
recommendations in more detail). At the individual product level, association 
rules provide some answers in this area. In particular, this data mining tech-
nique determines which product or products in a purchase suggest the pur-
chase of other particular products at the same time. 
Sometimes it is desirable to find larger clusters than those provided by asso-
ciation rules, which include just a handful of items in any rule. Standard cluster-
ing techniques, which are described in Chapter 11, can also be used on market 
basket data. In this case, the data needs to be pivoted, as shown in Figure 9.6, so 
that each row represents one order or customer, and there is a flag or a counter 
for each product purchased. Unfortunately, there are often thousands of differ-
ent products. To reduce the number of columns, such a transformation can take 
place at the category level, rather than at the individual product level. 
There is typically a lot of information available about products. In addition 
to the product hierarchy, such information includes the color of clothes, 
whether food is low calorie, whether a poster includes a frame, and so on. 
Such descriptions provide a wealth of information, and can lead to useful ad 
hoc questions: 
■■ Do diet products tend to sell together? 
■■ Are customers purchasing similar colors of clothing at the same time? 
■■ Do customers who purchase framed posters also buy other products? 
Being able to answer such questions is often more useful than trying to clus-
ter products, since such directed questions often lead directly to marketing 
actions. 
LINE ITEM 
LINE I 
ORDER ID 
UNIT PRICE 
UNIT COST 
GIFT WRAP 
etc. 
LINE ITEM 
LINE I 
ORDER ID 
UNIT PRICE 
UNIT COST 
GIFT WRAP 
etc. 
LINE ITEM 
ORDER ID 
UNIT PRICE 
UNIT COST 
etc. 
PRODUCT ID 
QUANTITY 
TAXABLE FLA 
PRODUCT ID 
QUANTITY 
TAXABLE FLA 
LINE ITEM ID 
PRODUCT ID 
QUANTITY 
GIFT WRAP FLAG 
TAXABLE FLAG 
ORDER PIVOT 
ORDER ID 
HAS PRODUCT A 
HAS PRODUCT B 
HAS PRODUCT C 
HAS PRODUCT D 
etc. 
ORDER ID 
PRODUCT A 
PRODUCT B 
PRODUCT C 
PRODUCT D 
0 1 1 0 . . 
ORDER ID LINE ITEM ID B 
ORDER ID LINE ITEM ID C 
Figure 9.6 Pivoting market basket data makes it possible to run clustering algorithms to 
find interesting groups of products. 
470643 c09.qxd 3/8/04 11:15 AM Page 296
296 Chapter 9 
Association Rules 
One appeal of association rules is the clarity and utility of the results, which 
are in the form of rules about groups of products. There is an intuitive appeal 
to an association rule because it expresses how tangible products and services 
group together. A rule like, “if a customer purchases three-way calling, then that 
customer will also purchase call waiting,” is clear. Even better, it might suggest a 
specific course of action, such as bundling three-way calling with call waiting 
into a single service package. 
While association rules are easy to understand, they are not always useful. 
The following three rules are examples of real rules generated from real data: 
■■ Wal-Mart customers who purchase Barbie dolls have a 60 percent likeli-
hood of also purchasing one of three types of candy bars. 
■■ Customers who purchase maintenance agreements are very likely to 
purchase large appliances. 
■■ When a new hardware store opens, one of the most commonly sold 
items is toilet bowl cleaners. 
The last two examples are examples that we have actually seen in data. The 
first is an example quoted in Forbes on September 8, 1997. These three exam-
ples illustrate the three common types of rules produced by association rules: 
the actionable, the trivial, and the inexplicable. In addition to these types of rules, 
the sidebar “Famous Rules” talks about one other category. 
Actionable Rules 
The useful rule contains high-quality, actionable information. Once the pattern is 
found, it is often not hard to justify, and telling a story can lead to insights and 
action. Barbie dolls preferring chocolate bars to other forms of food is not a likely 
story. Instead, imagine a family going shopping. The purpose: finding a gift for 
little Susie’s friend Emily, since her birthday is coming up. A Barbie doll is the 
perfect gift. At checkout, little Jacob starts crying. He wants something too—a 
candy bar fits the bill. Or perhaps Emily has a brother; he can’t be left out of the 
gift-giving festivities. Maybe the candy bar is for Mom, since buying Barbie dolls 
is a tiring activity and Mom needs some energy. These scenarios all suggest that 
the candy bar is an impulse purchase added onto that of the Barbie doll. 
Whether Wal-Mart can make use of this information is not clear. This rule 
might suggest more prominent product placement, such as ensuring that cus-
tomers must walk through candy aisles on their way back from Barbie-land. It 
might suggest product tie-ins and promotions offering candy bars and dolls 
together. It might suggest particular ways to advertise the products. Because the 
rule is easily understood, it suggests plausible causes and possible interventions. 
470643 c09.qxd 3/8/04 11:15 AM Page 297
Market Basket Analysis and Association Rules 297 
Trivial Rules 
Trivial results are already known by anyone at all familiar with the business. The sec-
ond example (“Customers who purchase maintenance agreements are very 
likely to purchase large appliances”) is an example of a trivial rule. In fact, cus-
tomers typically purchase maintenance agreements and large appliances at the 
same time. Why else would they purchase maintenance agreements? The two 
are advertised together, and rarely sold separately (although when sold sepa-
rately, it is the large appliance that is sold without the agreement rather than 
the agreement sold without the appliance). This rule, though, was found after 
analyzing hundreds of thousands of point-of-sale transactions from Sears. 
Although it is valid and well supported in the data, it is still useless. Similar 
results abound: People who buy 2-by-4s also purchase nails; customers who 
purchase paint buy paint brushes; oil and oil filters are purchased together, as 
are hamburgers and hamburger buns, and charcoal and lighter fluid. 
A subtler problem falls into the same category. A seemingly interesting 
result—such as the fact that people who buy the three-way calling option on 
their local telephone service almost always buy call waiting—may be the result 
of past marketing programs and product bundles. In the case of telephone ser-
vice options, three-way calling is typically bundled with call waiting, so it is 
difficult to order it separately. In this case, the analysis does not produce action-
able results; it is producing already acted-upon results. Although it is a danger 
for any data mining technique, market basket analysis is particularly suscepti-
ble to reproducing the success of previous marketing campaigns because of its 
dependence on unsummarized point-of-sale data—exactly the same data that 
defines the success of the campaign. Results from market basket analysis may sim-
ply be measuring the success of previous marketing campaigns. 
Trivial rules do have one use, although it is not directly a data mining use. 
When a rule should appear 100 percent of the time, the few cases where it does 
not hold provide a lot of information about data quality. That is, the exceptions 
to trivial rules point to areas where business operations, data collection, and 
processing may need to be further refined. 
Inexplicable Rules 
Inexplicable results seem to have no explanation and do not suggest a course of action. 
The third pattern (“When a new hardware store opens, one of the most com-
monly sold items is toilet bowl cleaner”) is intriguing, tempting us with a new 
fact but providing information that does not give insight into consumer behav-
ior or the merchandise or suggest further actions. In this case, a large hardware 
company discovered the pattern for new store openings, but could not figure 
out how to profit from it. Many items are on sale during the store openings, 
but the toilet bowl cleaners stood out. More investigation might give some 
470643 c09.qxd 3/8/04 11:15 AM Page 298
298 Chapter 9 
explanation: Is the discount on toilet bowl cleaners much larger than for other 
products? Are they consistently placed in a high-traffic area for store openings 
but hidden at other times? Is the result an anomaly from a handful of stores? 
Are they difficult to find at other times? Whatever the cause, it is doubtful that 
further analysis of just the market basket data can give a credible explanation. 
WARNING When applying market basket analysis, many of the results are 
often either trivial or inexplicable. Trivial rules reproduce common knowledge 
about the business, wasting the effort used to apply sophisticated analysis 
techniques. Inexplicable rules are flukes in the data and are not actionable. 
Lo and behold, lurking in all the transaction data, is the fact that beer and 
to figure out what is happening. A flash of insight provides the explanation: 
likely story is that families with young children are preparing for the weekend, 
couple of beers, Mom will change the diapers. 
the customer must walk by as many stocked shelves as possible, having the 
Forbes magazine 
called “Beer-Diaper Syndrome.” 
FAMOUS RULES: BEER AND DIAPERS 
Perhaps the most talked about association rule ever “found” is the association 
between beer and diapers. This is a famous story from the late 1980s or early 
1990s, when computers were just getting powerful enough to analyze large 
volumes of data. The setting is somewhere in the midwest, where a retailer is 
analyzing point of sale data to find interesting patterns. 
diapers are selling together. This immediately sets marketing minds in motion 
beer drinkers do not want to interrupt their enjoyment of televised sports, so 
they buy diapers to reduce trips to the bathroom. No, that’s not it. The more 
diapers for the kids and beer for Dad. Dad probably knows that after he has a 
This is a powerful story. Setting aside the analytics, what can a retailer do 
with this information? There are two competing views. One says to put the beer 
and diapers close together, so when one is purchased, customers remember 
to buy the other one. The other says to put them as far apart as possible, so 
opportunity to buy yet more items. The store could also put higher-margin 
diapers a bit closer to the beer, although mixing baby products and alcohol 
would probably be unseemly. 
The story is so powerful that the authors noticed at least four companies 
using the story—IBM, Tandem (now part of HP), Oracle, and NCR Teradata. The 
actual story was debunked on April 6, 1998 in an article in 
The debunked story still has a lesson. Apparently, the sales of beer and 
diapers were known to be correlated (at least in some stores) based on 
inventory. While doing a demonstration project, a sales manager suggested that 
the demo show something interesting, like “beer and diapers” being sold 
together. With this small hint, analysts were able to find evidence in the data. 
Actually, the moral of the story is not about the power of association rules. It is 
that hypothesis testing can be very persuasive and actionable. 
470643 c09.qxd 3/8/04 11:15 AM Page 299
Market Basket Analysis and Association Rules 299 
How Good Is an Association Rule? 
Association rules start with transactions containing one or more products or ser-
vice offerings and some rudimentary information about the transaction. For the 
purpose of analysis, the products and service offerings are called items. Table 9.1 
illustrates five transactions in a grocery store that carries five products. 
These transactions have been simplified to include only the items pur-
chased. How to use information like the date and time and whether the cus-
tomer paid with cash or a credit card is discussed later in this chapter. 
Each of these transactions gives us information about which products are 
purchased with which other products. This is shown in a co-occurrence table 
that tells the number of times that any pair of products was purchased 
together (see Table 9.2). For instance, the box where the “Soda” row intersects 
the “OJ” column has a value of “2,” meaning that two transactions contain 
both soda and orange juice. This is easily verified against the original transac-
tion data, where customers 1 and 4 purchased both these items. The values 
along the diagonal (for instance, the value in the “OJ” column and the “OJ” 
row) represent the number of transactions containing that item. 
Table 9.1 Grocery Point-of-Sale Transactions 
CUSTOMER ITEMS 
1 Orange juice, soda 
2 Milk, orange juice, window cleaner 
3 Orange juice, detergent 
4 Orange juice, detergent, soda 
5 Window cleaner, soda 
Table 9.2 Co-Occurrence of Products 
WINDOW 
OJ CLEANER MILK SODA DETERGENT 
OJ 4 1 1 1 2 
Window Cleaner 1 2 1 1 0 
Milk 1 1 1 0 0 
Soda 2 1 0 3 3 
Detergent 1 0 0 1 2 
470643 c09.qxd 3/8/04 11:15 AM Page 300
300 Chapter 9 
This simple co-occurrence table already highlights some simple patterns: 
■■ Orange juice and soda are more likely to be purchased together than 
any other two items. 
■■ Detergent is never purchased with window cleaner or milk. 
■■ Milk is never purchased with soda or detergent. 
These observations are examples of associations and may suggest a formal 
rule like: “If a customer purchases soda, then the customer also purchases orange 
juice.” For now, let’s defer discussion of how to find the rule automatically, and 
instead ask another question. How good is this rule? 
In the data, two of the five transactions include both soda and orange juice. 
These two transactions support the rule. The support for the rule is two out of 
five or 40 percent. Since both the transactions that contain soda also contain 
orange juice, there is a high degree of confidence in the rule as well. In fact, two 
of the three transactions that contains soda also contains orange juice, so the 
rule “if soda, then orange juice” has a confidence of 67 percent percent. The 
inverse rule, “if orange juice, then soda,” has a lower confidence. Of the four 
transactions with orange juice, only two also have soda. Its confidence, then, is 
just 50 percent. More formally, confidence is the ratio of the number of the 
transactions supporting the rule to the number of transactions where the con-
ditional part of the rule holds. Another way of saying this is that confidence is 
the ratio of the number of transactions with all the items to the number of 
transactions with just the “if” items. 
Another question is how much better than chance the rule is. One way to 
answer this is to calculate the lift (also called improvement), which tells us how 
much better a rule is at predicting the result than just assuming the result in 
the first place. Lift is the ratio of the density of the target after application of the 
left-hand side to the density of the target in the population. Another way of 
saying this is that lift is the ratio of the records that support the entire rule to 
the number that would be expected, assuming that there is no relationship 
between the products (the exact formula is given later in the chapter). A 
similar measure, the excess, is the difference between the number of records 
supported by the entire rule minus the expected value. Because the excess 
is measured in the same units as the original sales, it is sometimes easier to 
work with. 
Figure 9.7 provides an example of lift, confidence, and support as provided 
by Blue Martini, a company that specializes in tools for retailers. Their soft-
ware system includes a suite of analysis tools that includes association rules. 
470643 c09.qxd 3/8/04 11:15 AM Page 301
Market Basket Analysis and Association Rules 301 
This particular example shows that a particular jacket is much more likely to 
be purchased with a gift certificate, information that can be used for improv-
ing messaging for selling both gift certificates and jackets. 
The ideas behind the co-occurrence table extend to combinations with any 
number of items, not just pairs of items. For combinations of three items, imag-
ine a cube with each side split into five different parts, as shown in Figure 9.8. 
Even with just five items in the data, there are already 125 different subcubes 
to fill in. By playing with symmetries in the cube, this can be reduced a bit (by 
a factor of six), but the number of subcubes for groups of three items is 
proportional to the third power of the number of different items. In general, 
the number of combinations with n items is proportional to the number of 
items raised to the nth power—a number that gets very large, very fast. 
And generating the co-occurrence table requires doing work for each of these 
combinations. 
Figure 9.7 Blue Martini provides an interface that shows the support, confidence, and lift 
of an association rule. 
Figure 9.8 A co-occurrence table in three dimensions can be visualized as a cube.
Building Association Rules
This basic process for finding association rules is illustrated in Figure 9.9.
There are three important concerns in creating association rules:
■■ Choosing the right set of items.
■■ Generating rules by deciphering the counts in the co-occurrence matrix.
■■ Overcoming the practical limits imposed by thousands or tens of thou-
sands of items.
The next three sections delve into these concerns in more detail.
OJ Soda DetergentCleaner Milk
OJ
Cleaner
Milk
Soda
Detergent
OJ
Cleaner
Milk
Soda
Detergent
Orange juice, milk, and
window cleaner appear
together in exactly one
transaction.
41121
11100
11100
20021
10011
302 Chapter 9
470643 c09.qxd 3/8/04 11:15 AM Page 302
TEAMFLY        
                                             Team-Fly
®