35 Association Rules

Dr R. Baskaran

ASSOCIATION RULES

The associative rules method is an example of an unsupervised grouping method, that is, the goal is not used to direct how the grouping is generated. This method groups observations and attempts to understand links or associations between different attributes of the group. Associative rules have been applied in many situations, such as data mining retail transactions. This method generates rules from the groups, as the following example:

IF the customer’s age is 18 AND

the customer buys paper AND

the customer buys a hole punch

THEN the customer buys a binder

The rule states that 18-year-old customers who purchase paper and a hole punch will often buy a binder at the same time. This rule would have been generated directly from a data set. Using this information the retailer may decide, for example, to create a package of products for college students.

Associative rules have a number of advantages:

Easy to interpret: The results are presented in the form of a rule that is easily understood.
Actionable: It is possible to perform some sort of action based on the rule.
For example, the rule in the previous example allowed the retailer to market this combination of items differently. Large data sets: It is possible to use this technique with large numbers of observations.
There are three primary limitations to this method:

Only categorical variables: The method forces you to either restrict your analysis to variables that are categorical or to convert continuous variable to categorical variables.

Time-consuming: Generating the rules can be time-consuming for the computer; especially where a data set has many Variables and/or many possible values per variable. There are ways to make the analysis run faster but they often compromise the final results.

Rule prioritization: The method can generate many rules that must be prioritized and interpreted.

In this method, creating useful rules from the data is done by grouping the data, extracting rules from the groups, and then prioritizing the rules.

Grouping by Value Combinations:

Let us first consider a simple situation concerning a shop that only sells cameras and televisions. A data set of 31,612 sales transactions is used, which contains three variables: Customer ID, Gender and Purchase. The variable Gender identifies whether the buyer is male or female. The variable Purchase refers to the item purchased and can only have two values, camera and television.

Table 1 shows three rows from this table. By grouping this set of 31,612 observations, based on specific values for the variables Gender and Purchase, the groups in Table 2 are generated. There are eight ways of grouping this trivial.

example based on the values for the different categories. For example, there are 7,889 observations where Gender is male and Purchase is camera. If an additional variable is added to this data set, the number of possible groups will increase. For example, if another variable Income which has two values, above Rs.50K and below Rs.50K, is added to the table (Table 3), the number of groups would increase to 26 as shown in Table 4.

Increasing the number of variables and/or the number of possible values for each variable increases the number of groups. The number of groups may become so large that it would be impossible to generate all combinations. However, most data sets contain many possible combinations of values with zero or only a handful of observations. Techniques for generating the groups can take advantage of this fact. By increasing the minimum size of a group, fewer groups are generated and the analysis is completed faster. However, care should be taken in setting this cutoff value since no rules will be generated from any groups where the number of observations is below this cutoff. For example, if this number is set to ten, then no rules will be generated from groups containing less than ten examples. Subject matter knowledge and information generated from the data characterization phase will help in setting this value. It is a trade-off between how fast you wish the rule generation to take versus how subtle the rules need to be (i.e. rules based on a few observations).

Extracting Rules from Groups

Overview

So far a data set has been grouped according to specific values for each of the variables. In Figure 1, 26 observations (A to Z) are characterized by three

variables, Shape, Color, and Border. Observation A has Shape = square, Color = white, and Border = thick and observation W has Shape = circle, Color = gray, and Border = thin. As described in the previous section, the observations are grouped. An example grouping is shown below where: Shape = circle, Color = gray Border = thick.

The next step is to extract a rule from the group. There are three possible rules that could be pulled out from this group:

Rule 1: IF Color = gray AND Shape = circle THEN Border = thick

Rule 2: IF Border = thick AND Color = gray THEN Shape = circle

Rule 3: IF Border = thick AND Shape = circle THEN Color = gray.

We now compare each rule to the whole data set in order to prioritize the rules and three values are calculated: support, confidence and lift.

Support

The support value is another way of describing the number of observations that the rule (created from the group) maps onto, that is, the size of the group. The support is often defined as a proportion or percentage. In this example, the data set has 26 observations and the group of gray circles with a thick border is six, then the group has a support value of six out of 26 or 0.23 (23%).

Confidence

Each rule is divided into two parts. The IF-part or antecedent refers to the list of statements linked with AND in the first part of the rule. For example, IF Color = gray AND Shape = circle THEN Border = thick The IF-part is the list of statements Color = gray AND Shape = circle. The THEN-part of the rule or consequence refers to any statements after the THEN (Border = thick in this example). The confidence score is a measure for how predictable a rule is. The confidence (or predictability) value is calculated using the support for the entire group divided by the support for all observations satisfied by the IF-part of the rule:

Confidence = group support / IF-part support

For example, the confidence value for Rule 1

Rule 1: IF Color = gray AND Shape = circle THEN Border = thick is calculated using the support value for the group and the support value for the IF-part of the rule.

The support value for the group (gray circles with a thick border) is 0.23 and the support value for the IF-part of the rule (gray circles) is seven out of 26 or 0.27. To calculate the confidence, divide the support for the group by the support for the IF-part:

Confidence = 0:23=0:27 = 0:85 Confidence values range from no confidence (0) to high confidence (1). Since a value of 0.85 is close to 1, we have a high degree of confidence in this rule. Most likely, gray circles will have thick border.