4. Mining Frequent Items and Associations

Mining Frequent Items and Association Rules is an important area of Data Mining that focuses on discovering hidden relationships, patterns, and associations among items in large databases.

Organizations generate massive transactional data daily from:

  • Retail stores

  • Online shopping platforms

  • Banking systems

  • Healthcare systems

  • Websites

Analyzing this data manually is difficult. Association analysis helps discover:

  • Frequently occurring items

  • Customer purchasing patterns

  • Product relationships

  • Hidden trends

This information helps organizations:

  • Improve sales

  • Increase profit

  • Enhance recommendation systems

  • Make better business decisions


4.1 Frequent Item Set

1. Definition of Frequent Item Set

A Frequent Item Set is a set of one or more items that occur together in a transaction database with frequency greater than or equal to a specified minimum support threshold.

In simple words, if certain items appear together many times in transactions, they are called frequent item sets.


2. Important Terminologies

Before understanding frequent item sets, some basic terms are important.


1. Item

An item is a single product or object in a transaction.

Examples

  • Bread

  • Milk

  • Butter

  • Rice


2. Item Set

An item set is a collection of one or more items.

Examples

  • {Bread}

  • {Milk, Butter}

  • {Bread, Milk, Eggs}


3. Transaction

A transaction is a collection of items purchased together by a customer.

Example Transaction Table

Transaction IDItems Purchased
T1Bread, Milk
T2Bread, Butter
T3Milk, Butter
T4Bread, Milk, Butter

4. Transaction Database

A transaction database is a collection of transactions.


3. Support Measure

Support is one of the most important measures in frequent item set mining.

It measures how frequently an item set appears in the transaction database.


Formula for Support

Support(A)=\frac{\text{Number of transactions containing A}}{\text{Total number of transactions}}


Example of Support Calculation

Consider the following transactions:

Transaction IDItems
T1Bread, Milk
T2Bread, Butter
T3Bread, Milk
T4Milk, Butter
T5Bread, Milk

Find support of:

  • {Bread, Milk}

Step 1: Count Transactions Containing {Bread, Milk}

Present in:

  • T1

  • T3

  • T5

Total = 3 transactions


Step 2: Total Transactions

Total transactions = 5


Step 3: Calculate Support

Support(Bread,Milk)=\frac{3}{5}=0.6

Support = 0.6 = 60%


4. Minimum Support Threshold

Minimum support is a user-defined value used to determine whether an item set is frequent.

Example

If minimum support = 50%

then:

  • Item sets with support ≥ 50% are considered frequent.

5. Importance of Frequent Item Sets

Frequent item sets are important because they help discover:

  • Customer buying habits

  • Product relationships

  • Frequent patterns

  • Hidden trends

They are the foundation for:

  • Association rule mining

  • Recommendation systems

  • Market basket analysis


6. Applications of Frequent Item Sets

1. Retail Industry

Identifying products commonly purchased together.

Example

Customers buying:

  • Bread

  • Butter

together frequently.


2. E-Commerce Websites

Recommendation systems suggest related products.

Example

“Customers who bought this also bought…”


3. Medical Diagnosis

Finding diseases and symptoms occurring together.


4. Web Usage Mining

Analyzing frequently visited web pages together.


4.2 Closed Item Set

1. Definition of Closed Item Set

A Closed Item Set is a frequent item set for which none of its supersets has the same support count.

In simple words:

  • If adding another item changes the support value,

  • then the item set is considered closed.


2. Explanation of Closed Item Set

Closed item sets help reduce redundancy in mining results.

Many frequent item sets may contain duplicate information. Closed item sets provide a compact representation without losing support information.


3. Example of Closed Item Set

Consider transactions:

Transaction IDItems
T1Bread, Milk
T2Bread, Milk
T3Bread, Butter

Support table:

Item SetSupport
{Bread}3
{Bread, Milk}2

Since:

  • Support of {Bread} = 3

  • Support of {Bread, Milk} = 2

Support changes after adding Milk.

Therefore:

  • {Bread} is closed.

4. Characteristics of Closed Item Sets

1. No Superset with Same Support

A closed item set has no larger item set with identical support.


2. Compact Representation

Closed item sets reduce the number of patterns stored.


3. Preserves Frequency Information

Support values remain meaningful and accurate.


4. Reduces Redundancy

Duplicate frequent patterns are removed.


5. Advantages of Closed Item Sets

1. Reduced Storage Requirement

Fewer patterns need to be stored.


2. Improved Mining Efficiency

Reduces processing complexity.


3. Simplified Analysis

Users can analyze patterns more easily.


4. Eliminates Redundant Information

Only meaningful patterns are retained.


4.3 Association Rule Mining

1. Definition of Association Rule Mining

Association Rule Mining is a Data Mining technique used to discover relationships, associations, and correlations among items in large databases.

It identifies:

  • Frequently occurring item combinations

  • Relationships between products

  • Customer purchasing behavior


2. Association Rule

An association rule is represented in the form:

A \Rightarrow B

where:

  • A = Antecedent (Left-hand side)

  • B = Consequent (Right-hand side)

Meaning:

  • If A occurs, B is likely to occur.

3. Example of Association Rule

Rule:

Bread → Butter

Meaning:

  • Customers purchasing bread are likely to purchase butter.

4. Components of Association Rule

1. Antecedent

Items appearing before the arrow.

Example

Bread


2. Consequent

Items appearing after the arrow.

Example

Butter


5. Rule Generation Process

Association rule mining mainly involves two steps.


Step 1: Frequent Item Set Generation

Frequent item sets satisfying minimum support are identified.


Step 2: Rule Generation

Association rules are generated from frequent item sets.


6. Support and Confidence

Support and confidence are used to evaluate association rules.


1. Support

Support measures how often an association rule appears in the database.

Formula

Support(A \Rightarrow B)=\frac{\text{Transactions containing A and B}}{\text{Total transactions}}


2. Confidence

Confidence measures the reliability of the association rule.

Formula

Confidence(A \Rightarrow B)=\frac{Support(A \cup B)}{Support(A)}


Example of Confidence Calculation

Suppose:

  • Bread appears in 50 transactions

  • Bread and Butter together appear in 30 transactions

Confidence:

Confidence(Bread \Rightarrow Butter)=\frac{30}{50}=0.6

Confidence = 60%

Meaning:

  • 60% of customers purchasing bread also purchase butter.

7. Importance of Association Rule Mining

1. Discover Hidden Relationships

Finds meaningful product relationships.


2. Supports Business Decisions

Helps improve marketing and sales strategies.


3. Improves Recommendation Systems

Suggests related products to customers.


4. Helps Customer Analysis

Analyzes customer purchasing behavior.


8. Applications of Association Rule Mining

1. Market Basket Analysis

Finding products purchased together.


2. Fraud Detection

Identifying suspicious transaction patterns.


3. Medical Diagnosis

Finding relationships among symptoms and diseases.


4. Web Usage Analysis

Analyzing user navigation behavior.


4.4 Market Basket Analysis

1. Definition of Market Basket Analysis

Market Basket Analysis is a technique used to analyze customer purchasing patterns by identifying products frequently bought together.

It is one of the most common applications of association rule mining.


2. Objective of Market Basket Analysis

The main objectives are:

  • Understand customer buying behavior

  • Increase sales

  • Improve product placement

  • Support cross-selling


3. Customer Purchasing Patterns

Purchasing patterns show relationships among products purchased by customers.

Examples

Customers buying:

  • Bread may also buy butter

  • Mobile phones may buy earphones

  • Chips may buy soft drinks


4. Product Association Analysis

Product association analysis identifies related products.


1. Product Placement

Related products are placed nearby.

Example

Bread and butter placed in nearby shelves.


2. Cross-Selling

Suggesting additional products.

Example

“Customers also bought…”


3. Combo Offers

Organizations create promotional offers using associations.

Example

Burger + Soft Drink combo.


5. Advantages of Market Basket Analysis

1. Improves Sales

Encourages customers to buy additional products.


2. Enhances Customer Satisfaction

Provides better recommendations.


3. Helps Inventory Management

Improves stock planning.


4. Supports Marketing Strategies

Helps targeted advertising and promotions.


4.5 Classification of Association Rules

Association rules are classified based on dimensions and data types involved.


1. Single-Dimensional Association Rules

Definition

Association rules involving only one dimension or predicate are called single-dimensional association rules.


Example

Buys(Customer, Bread) → Buys(Customer, Butter)

Only the “buys” dimension is used.


Characteristics

  • Simple structure

  • Easy implementation

  • Common in retail analysis


2. Multidimensional Association Rules

Definition

Association rules involving multiple dimensions are called multidimensional association rules.


Example

Age(20-30) ∧ Occupation(Student) → Buys(Laptop)

Dimensions involved:

  • Age

  • Occupation

  • Product


Characteristics

  • More informative

  • More complex

  • Rich analytical insights


3. Boolean Association Rules

Definition

Boolean association rules consider only:

  • Presence

  • Absence

of items.


Example

Bread → Butter

Only whether items exist or not is considered.


Characteristics

  • Binary values only

  • Simple calculations

  • Widely used


4. Quantitative Association Rules

Definition

Quantitative association rules involve numerical attributes or quantities.


Example

Age between 20-30 → Purchase amount > 5000

Characteristics

  • Uses numerical data

  • More detailed analysis

  • Requires complex computations


4.6 Apriori Algorithm

1. Introduction to Apriori Algorithm

The Apriori Algorithm is one of the most popular algorithms for mining frequent item sets and association rules.

It was proposed by:

  • Rakesh Agrawal

  • Ramakrishnan Srikant

The algorithm works using the Apriori principle.


2. Principle of Apriori

The Apriori principle states:

“If an item set is frequent, then all of its subsets must also be frequent.”

This helps reduce unnecessary computations.


Example of Apriori Principle

If:

  • {Bread, Milk}

is frequent,

then:

  • {Bread}

  • {Milk}

must also be frequent.

If:

  • {Bread}

is not frequent,

then:

  • {Bread, Milk}

cannot be frequent.


3. Working of Apriori Algorithm

The algorithm works iteratively level by level.


Steps of Apriori Algorithm

Step 1: Generate Frequent 1-Item Sets

Count support of individual items.


Step 2: Remove Infrequent Items

Items below minimum support are removed.


Step 3: Candidate Generation

Generate candidate item sets from previous frequent item sets.


Step 4: Calculate Support

Support values of candidate sets are calculated.


Step 5: Pruning

Remove item sets whose subsets are infrequent.


Step 6: Repeat

Repeat until no new frequent item sets are found.


4. Candidate Generation

Candidate generation creates possible frequent item sets.

Example

Frequent 1-item sets:

  • Bread

  • Milk

  • Butter

Candidate 2-item sets:

  • {Bread, Milk}

  • {Bread, Butter}

  • {Milk, Butter}


5. Pruning

Pruning removes unnecessary candidate item sets.

Apriori Pruning Rule

If any subset of a candidate item set is infrequent, then the candidate itself is removed.


Example of Pruning

If:

  • {Bread}

is infrequent,

then:

  • {Bread, Milk}

cannot be frequent and is removed.


6. Advantages of Apriori Algorithm

1. Simple and Easy to Understand

Widely used due to simplicity.


2. Reduces Search Space

Pruning minimizes unnecessary computations.


3. Efficient for Small and Medium Databases

Performs well for moderate datasets.


4. Generates Useful Association Rules

Supports recommendation systems and business analysis.


7. Limitations of Apriori Algorithm

1. Multiple Database Scans

Requires repeated scanning of the database.


2. Large Candidate Sets

Candidate generation becomes expensive for large datasets.


3. High Computational Cost

Performance decreases for huge databases.


4. High Memory Usage

Large candidate sets consume more memory.


8. Applications of Apriori Algorithm

1. Retail Analysis

Finding products purchased together.


2. Recommendation Systems

Suggesting related products.


3. Medical Analysis

Finding related symptoms and diseases.


4. Website Usage Analysis

Identifying frequently visited web pages together.