[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Scalable Parallel Data Mining for Association Rules

Published: 01 May 2000 Publication History

Abstract

In this paper, we propose two new parallel formulations of the Apriori algorithm that is used for computing association rules. These new formulations, IDD and HD, address the shortcomings of two previously proposed parallel formulations CD and DD. Unlike the CD algorithm, the IDD algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree. The IDD algorithm also eliminates the redundant work inherent in DD, and requires substantially smaller communication overhead than DD. But IDD suffers from the added cost due to communication of transactions among processors. HD is a hybrid algorithm that combines the advantages of CD and DD. Experimental results on a 128-processor Cray T3E show that HD scales just as well as the CD algorithm with respect to the number of transactions, and scales as well as IDD with respect to increasing candidate set size.

References

[1]
M. Stonebraker R. Agrawal U. Dayal E.J. Neuhold and A. Reuter, “DBMS Research at a Crossroads: The Vienna Update,” Proc. 19th Very Large Data Bases Conf., pp. 688–692, 1993.
[2]
R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Very Large Data Bases Conf., pp. 487–499, 1994.
[3]
M.A.W. Houtsma and A.N. Swami, “Set-Oriented Mining for Association Rules in Relational Databases,” Proc. 11th Int'l Conf. on Data Eng., pp. 25–33, 1995.
[4]
A. Savasere E. Omiecinski and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 21st Very Large Data Bases Conf., pp. 432–443, 1995.
[5]
R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 21st Very Large Data Bases Conf., pp. 407–419 1995.
[6]
R. Agrawal and J.C. Shafer, “Parallel Mining of Association Rules,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 962–969, Dec. 1996.
[7]
E.H. Han G. Karypis and V. Kumar, “Scalable Parallel Data Mining for Association Rules,” Proc. 1997 ACM-SIGMOD Int'l Conf. Management of Data, 1997.
[8]
R. Agrawal T. Imielinski and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, 1993.
[9]
V Kumar A. Grama A. Gupta and G. Karypis, Introduction to Parallel Computing: Algorithm Design and Analysis. : Redwood City: Benjamin Cummings/ Addison Wesley, 1994.
[10]
C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Englewood Cliffs, NJ: Prentice-Hall, 1982.
[11]
T. Shintani and M. Kitsuregawa, “Hash Based Parallel Algorithms for Mining Association Rules,” Proc. Conf. Paralellel and Distributed Information Systems, 1996.
[12]
J.S. Park M.S. Chen and P.S. Yu, “Efficient Parallel Data Mining for Association Rules,” Proc. Fourth Int'l Conf. Information and Knowledge Management, 1995.
[13]
D. Cheung V. Ng A. Fu and Y. Fu, “Efficient Mining of Association Rules in Distributed Databases,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 911–922, 1996.
[14]
M.J. Zaki S. Parthasarathy M. Ogihara and W. Li, “New Parallel Algorithms for Fast Discovery of Association Rules,” Data Mining and Knowledge Discovery: An International Journal, vol. 1, no. 4, 1997.
[15]
J.S. Park M.S. Chen and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, 1995.
[16]
M.J. Zaki S. Parthasarathy M. Ogihara and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, 1997.
[17]
IBM Quest Data Mining Project, “Quest Synthetic Data Generation Code,” http://www.almaden.ibm.com/cs/quest/syndata.html, 1996.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 12, Issue 3
May 2000
143 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 May 2000

Author Tags

  1. Data mining
  2. association rules
  3. load balance
  4. parallel processing
  5. scalability.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)WisRuleJournal of Information Science10.1177/0165551522110869550:4(874-893)Online publication date: 1-Aug-2024
  • (2021)Exploring Decomposition for Solving Pattern Mining ProblemsACM Transactions on Management Information Systems10.1145/343977112:2(1-36)Online publication date: 11-Feb-2021
  • (2021)Widening: using parallel resources to improve model qualityData Mining and Knowledge Discovery10.1007/s10618-021-00749-535:4(1258-1286)Online publication date: 1-Jul-2021
  • (2020)A general-purpose distributed pattern mining systemApplied Intelligence10.1007/s10489-020-01664-w50:9(2647-2662)Online publication date: 18-Mar-2020
  • (2018)A Fast Algorithm Based on Apriori Algorithms to Explore the Set of Repetitive Items of Large Transaction DataProceedings of the 2nd International Conference on Compute and Data Analysis10.1145/3193077.3193089(13-19)Online publication date: 23-Mar-2018
  • (2017)Frequent Itemset Mining in Large Datasets a SurveyInternational Journal of Information Retrieval Research10.4018/IJIRR.20171001037:4(37-49)Online publication date: 1-Oct-2017
  • (2016)A sparse memory allocation data structure for sequential and parallel association rule miningThe Journal of Supercomputing10.1007/s11227-015-1566-x72:2(347-370)Online publication date: 1-Feb-2016
  • (2015)Data Analysis in the CloudundefinedOnline publication date: 9-Oct-2015
  • (2013)Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)The Journal of Supercomputing10.1007/s11227-011-0672-764:3(942-967)Online publication date: 1-Jun-2013
  • (2008)Efficient mining of maximal frequent itemsets from databases on a cluster of workstationsKnowledge and Information Systems10.5555/3227211.322732916:3(359-391)Online publication date: 1-Sep-2008
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media