I developed a highly optimized implementation of the Apriori algorithm specifically designed for extremely sparse datasets:
- Challenge: Analyzed a dataset where the most frequent item appears in only 0.42% of transactions
- Solution: Implemented custom optimizations using efficient data structures and frequency-based sorting
- Results: Discovered meaningful association rules with lift values 23-32 times higher than random chance
- Technologies: Python, NumPy, Pandas, Matplotlib
π Key Findings
- Successfully processed 34,070 transactions with 1,559 unique items
- Discovered 6 frequent 2-itemsets despite extreme data sparsity
- Generated 12 association rules with high confidence and lift values
- Identified item 132 as the most connected item in the dataset