More Web Proxy on the site http://driver.im/

research-article

Scalable Parallel Data Mining for Association Rules

Authors:

Eui-Hong (Sam) Han,

George Karypis,

Vipin KumarAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 12, Issue 3

Pages 337 - 352

https://doi.org/10.1109/69.846289

Published: 01 May 2000 Publication History

Abstract

In this paper, we propose two new parallel formulations of the Apriori algorithm that is used for computing association rules. These new formulations, IDD and HD, address the shortcomings of two previously proposed parallel formulations CD and DD. Unlike the CD algorithm, the IDD algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree. The IDD algorithm also eliminates the redundant work inherent in DD, and requires substantially smaller communication overhead than DD. But IDD suffers from the added cost due to communication of transactions among processors. HD is a hybrid algorithm that combines the advantages of CD and DD. Experimental results on a 128-processor Cray T3E show that HD scales just as well as the CD algorithm with respect to the number of transactions, and scales as well as IDD with respect to increasing candidate set size.

References

[1]

M. Stonebraker R. Agrawal U. Dayal E.J. Neuhold and A. Reuter, “DBMS Research at a Crossroads: The Vienna Update,” Proc. 19th Very Large Data Bases Conf., pp. 688–692, 1993.

Digital Library

[2]

R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Very Large Data Bases Conf., pp. 487–499, 1994.

Digital Library

[3]

M.A.W. Houtsma and A.N. Swami, “Set-Oriented Mining for Association Rules in Relational Databases,” Proc. 11th Int'l Conf. on Data Eng., pp. 25–33, 1995.

Digital Library

[4]

A. Savasere E. Omiecinski and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 21st Very Large Data Bases Conf., pp. 432–443, 1995.

Digital Library

[5]

R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 21st Very Large Data Bases Conf., pp. 407–419 1995.

Digital Library

[6]

R. Agrawal and J.C. Shafer, “Parallel Mining of Association Rules,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 962–969, Dec. 1996.

Digital Library

[7]

E.H. Han G. Karypis and V. Kumar, “Scalable Parallel Data Mining for Association Rules,” Proc. 1997 ACM-SIGMOD Int'l Conf. Management of Data, 1997.

Digital Library

[8]

R. Agrawal T. Imielinski and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, 1993.

Digital Library

[9]

V Kumar A. Grama A. Gupta and G. Karypis, Introduction to Parallel Computing: Algorithm Design and Analysis. : Redwood City: Benjamin Cummings/ Addison Wesley, 1994.

Digital Library

[10]

C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Englewood Cliffs, NJ: Prentice-Hall, 1982.

Digital Library

[11]

T. Shintani and M. Kitsuregawa, “Hash Based Parallel Algorithms for Mining Association Rules,” Proc. Conf. Paralellel and Distributed Information Systems, 1996.

Digital Library

[12]

J.S. Park M.S. Chen and P.S. Yu, “Efficient Parallel Data Mining for Association Rules,” Proc. Fourth Int'l Conf. Information and Knowledge Management, 1995.

Digital Library

[13]

D. Cheung V. Ng A. Fu and Y. Fu, “Efficient Mining of Association Rules in Distributed Databases,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 911–922, 1996.

Digital Library

[14]

M.J. Zaki S. Parthasarathy M. Ogihara and W. Li, “New Parallel Algorithms for Fast Discovery of Association Rules,” Data Mining and Knowledge Discovery: An International Journal, vol. 1, no. 4, 1997.

Digital Library

[15]

J.S. Park M.S. Chen and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, 1995.

Digital Library

[16]

M.J. Zaki S. Parthasarathy M. Ogihara and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, 1997.

Digital Library

[17]

IBM Quest Data Mining Project, “Quest Synthetic Data Generation Code,” http://www.almaden.ibm.com/cs/quest/syndata.html, 1996.

Cited By

Khan SShaheen M(2024)WisRuleJournal of Information Science10.1177/0165551522110869550:4(874-893)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1177/01655515221108695
Djenouri YLin JNørvåg KRamampiaro HYu P(2021)Exploring Decomposition for Solving Pattern Mining ProblemsACM Transactions on Management Information Systems10.1145/343977112:2(1-36)Online publication date: 11-Feb-2021
https://dl.acm.org/doi/10.1145/3439771
Berthold MFillbrunn ASiebes A(2021)Widening: using parallel resources to improve model qualityData Mining and Knowledge Discovery10.1007/s10618-021-00749-535:4(1258-1286)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1007/s10618-021-00749-5
Show More Cited By

Index Terms

Scalable Parallel Data Mining for Association Rules
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
  2. Parallel computing methodologies
2. Information systems
  1. Data management systems
    1. Database design and models
      1. Relational database model
    2. Query languages
      1. Relational database query languages
  2. Information systems applications
    1. Data mining

Recommendations

An Adaptive Algorithm for Mining Association Rules on Shared-Memory Parallel Machines

Mining association rules from large databases is very costly. We propose to develop parallel algorithms for this task on shared-memory multiprocessor (SMP). All proposed parallel algorithms for other paradigms follow the conventional level-wise approach:...
Parallel data mining for association rules on shared-memory multi-processors
Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing

Data mining is an emerging research area, whose goal is to extract significant patterns or interesting rules from large databases. High-level inference from large volumes of routine business data can provide valuable information to businesses, such as ...
Efficient algorithm for the extraction of association rules in data mining
ICCSA'06: Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part II

The problem of data mining is to discover the pattern or trend in huge volume of data. The problem is similar to knowledge discovery in artificial intelligence. Here our goal is to discover rules that reflect the pattern in the data. These rules are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 12, Issue 3

May 2000

143 pages

ISSN:1041-4347

Issue’s Table of Contents

Copyright © Copyright © 2000 IEEE. All Rights Reserved.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 May 2000

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khan SShaheen M(2024)WisRuleJournal of Information Science10.1177/0165551522110869550:4(874-893)Online publication date: 1-Aug-2024
Djenouri YLin JNørvåg KRamampiaro HYu P(2021)Exploring Decomposition for Solving Pattern Mining ProblemsACM Transactions on Management Information Systems10.1145/343977112:2(1-36)Online publication date: 11-Feb-2021
Berthold MFillbrunn ASiebes A(2021)Widening: using parallel resources to improve model qualityData Mining and Knowledge Discovery10.1007/s10618-021-00749-535:4(1258-1286)Online publication date: 1-Jul-2021
Belhadi ADjenouri YLin JCano A(2020)A general-purpose distributed pattern mining systemApplied Intelligence10.1007/s10489-020-01664-w50:9(2647-2662)Online publication date: 18-Mar-2020
Ghofrani JBozorgmehr APanah A(2018)A Fast Algorithm Based on Apriori Algorithms to Explore the Set of Repetitive Items of Large Transaction DataProceedings of the 2nd International Conference on Compute and Data Analysis10.1145/3193077.3193089(13-19)Online publication date: 23-Mar-2018
Kumar MPal A(2017)Frequent Itemset Mining in Large Datasets a SurveyInternational Journal of Information Retrieval Research10.4018/IJIRR.20171001037:4(37-49)Online publication date: 1-Oct-2017
Soysal ÖGupta EDonepudi H(2016)A sparse memory allocation data structure for sequential and parallel association rule miningThe Journal of Supercomputing10.1007/s11227-015-1566-x72:2(347-370)Online publication date: 1-Feb-2016
Talia DTrunfio PMarozzo F(2015)Data Analysis in the CloudundefinedOnline publication date: 9-Oct-2015
Jian LWang CLiu YLiang SYi WShi Y(2013)Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)The Journal of Supercomputing10.1007/s11227-011-0672-764:3(942-967)Online publication date: 1-Jun-2013
Chung SLuo C(2008)Efficient mining of maximal frequent itemsets from databases on a cluster of workstationsKnowledge and Information Systems10.5555/3227211.322732916:3(359-391)Online publication date: 1-Sep-2008
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents