Market Basket Analysis using Hybrid Apriori Algorithm

Project Overview

This project implements a Hybrid Apriori algorithm for market basket analysis, designed specifically to handle extremely sparse transaction datasets. The implementation includes optimizations for efficient processing and comprehensive analysis tools for interpreting the results.

Dataset

The analysis uses two datasets:

Sales1998.txt:
- 34,070 transactions with an average of 4.83 items per transaction
- 1,559 unique items
- Extremely sparse: the most frequent item (277) appears in only 0.42% of transactions
productList.txt:
- Contains names for all 1,560 products
- Maps product IDs to their corresponding product names
- Enhances the interpretability of analysis results

Algorithm Implementation

The HybridApriori class implements a modified version of the classic Apriori algorithm with several optimizations:

Efficient Data Structures:
- Uses sets for transactions to speed up item lookups
- Employs Counter for efficient item counting
- Utilizes defaultdict for counting co-occurrences
Frequency-Based Sorting:
- Sorts items by frequency in descending order to optimize the search process
Pruning Strategies:
- Applies the Apriori principle to prune candidate itemsets
- Implements early termination if no frequent itemsets are found at a certain level
Memory Efficiency:
- Processes transactions in a streaming fashion
- Only stores frequent itemsets, not all candidate itemsets
Product Name Integration:
- Loads product names from productList.txt
- Maps product IDs to their corresponding names
- Enhances visualization and result interpretation by displaying product names alongside IDs

Key Findings

Due to the extreme sparsity of the dataset, the analysis required using an absolute support count (10 transactions, equivalent to 0.0294%) instead of a percentage-based threshold. This approach yielded:

6 frequent 2-itemsets
12 association rules with high lift values (ranging from 23 to 32)
Item 132 (Denny 60 Watt Lightbulb) appearing in 2 different frequent itemsets, making it the most connected item
Interestingly, the top 3 most frequent items (277 "Great English Muffins", 1352 "Carrington Ice Cream", 846 "Nationeel Fudge Brownies") don't appear in any frequent itemsets
Enhanced interpretability through product name integration, revealing meaningful associations like "Carlson Low Fat Sour Cream" frequently purchased with "Big Time Apple Cinnamon Waffles"

Files in this Repository

main.py: Implementation of the Hybrid Apriori algorithm and analysis tools
Sales1998.txt: Raw transaction data (each line represents a transaction)
productList.txt: Product names corresponding to product IDs
frequent_itemsets.csv: Output file containing discovered frequent itemsets with product names
association_rules.csv: Output file containing generated association rules with product names
requirements.txt: List of Python dependencies

Installation & Setup

Prerequisites

Python 3.13 or higher
Required packages: numpy, pandas, matplotlib

Installation

Clone this repository:

git clone https://github.com/ZamoRzgar/basket-analysis.git
cd basket-analysis

Install required packages:

pip install -r requirements.txt

Usage

To run the analysis:

python main.py

The script will:

Load and analyze the transaction data
Apply the Hybrid Apriori algorithm with appropriate parameters
Generate and display frequent itemsets and association rules
Create visualizations of the results
Save the findings to CSV files

Parameters

The algorithm uses the following default parameters:

Minimum support count: 10 transactions (0.0294%)
Minimum confidence: 0.05 (5%)
Maximum itemset length: 3

These parameters can be adjusted in the main() function to suit different analysis needs.

Visualizations

The implementation includes several visualization tools:

Support distribution for frequent itemsets
Top frequent items
Top association rules by lift
Item relationship networks

Results

The analysis reveals strong associations between specific item pairs, with lift values indicating that these associations are 23-32 times stronger than would be expected by random chance. These insights can be valuable for:

Product placement strategies
Targeted promotions and cross-selling
Inventory management
Customer behavior understanding

Future Work

Potential extensions to this project include:

Testing different support thresholds to balance pattern discovery and statistical significance
Implementing alternative algorithms like FP-Growth or Eclat for comparison
Incorporating temporal aspects to analyze how associations change over time
Applying clustering techniques to group similar items before running association rule mining

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), 487-499.
Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.
Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining. Addison-Wesley.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Zamo Rzgar Ahmed

If you found this project helpful, please consider giving it a star ⭐

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Market Basket Analysis using Hybrid Apriori Algorithm

Project Overview

Dataset

Algorithm Implementation

Key Findings

Files in this Repository

Installation & Setup

Prerequisites

Installation

Usage

Parameters

Visualizations

Results

Future Work

References

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
results		results
LICENSE		LICENSE
README.md		README.md
Sales1998.txt		Sales1998.txt
main.py		main.py
productList.txt		productList.txt
requirements.txt		requirements.txt
support_distribution.png		support_distribution.png
top_items.png		top_items.png
top_rules.png		top_rules.png

License

ZamoRzgar/Basket-Analysis

Folders and files

Latest commit

History

Repository files navigation

Market Basket Analysis using Hybrid Apriori Algorithm

Project Overview

Dataset

Algorithm Implementation

Key Findings

Files in this Repository

Installation & Setup

Prerequisites

Installation

Usage

Parameters

Visualizations

Results

Future Work

References

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages