Efficiently Mining Frequent Patterns from Dense Datasets Using a Cluster of Computers

Yudho Giri Sucahyo⁸,
Raj P. Gopalan⁸ &
Amit Rudra⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2903))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1601 Accesses

Abstract

Efficient mining of frequent patterns from large databases has been an active area of research since it is the most expensive step in association rules mining. In this paper, we present an algorithm for finding complete frequent patterns from very large dense datasets in a cluster environment. The data needs to be distributed to the nodes of the cluster only once and the mining can be performed in parallel many times with different parameter settings for minimum support. The algorithm is based on a master-slave scheme where a coordinator controls the data parallel programs running on a number of nodes of the cluster. The parallel program was executed on a cluster of Alpha SMPs. The performance of the algorithm was studied on small and large dense datasets. We report the results of the experiments that show both speed up and scale up of our algorithm along with our conclusions and pointers for further work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Data Partitioning for Fast Mining of Frequent Itemsets in Massively Distributed Environments

Data placement in massively distributed environments for fast parallel mining of frequent itemsets

Article 24 March 2017

Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce

References

Zaki, M.J.: Parallel and distributed association mining: A survey. IEEE Concurrency (Special Issue on Data Mining), 14–25 (October/December 1999)
Google Scholar
Baker, M., Buyya, R.: Cluster Computing: The Commodity Supercomputing. Software-Practice and Experience 1(1), 1–26 (1999)
Google Scholar
Jin, R., Agrawal, G.: An Efficient Association Mining Implementation of Cluster of SMPs. In: Proc. of workshop on Parallel and Distributed Data Mining, (PDDM) (2001)
Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-pattern Tree Approach. To appear in Data Mining and Knowledge Discovery: An International Journal, Kluwer Academic Publishers (2003)
Google Scholar
Gopalan, R.P., Sucahyo, Y.G.: Improving the Efficiency of Frequent Pattern Mining by Compact Data Structure Design. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, Springer, Heidelberg (2003)
Google Scholar
Liu, J., Pan, Y., Wang, K., Han, J.: Mining Frequent Item Sets by Opportunistic Projection. In: Proceedings of ACM SIGKDD, Edmonton, Alberta, Canada (2002)
Google Scholar
http://fuzzy.cs.uni-magdeburg.de/~borgelt/
http://www.almaden.ibm.com/cs/quest/syndata.html
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. of ACM SIGMOD, Washington, DC (1993)
Google Scholar
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface, 2nd edn. MIT Press, Cambridge (1999)
Google Scholar
http://www.ics.uci.edu/~mlearn/MLRepository.html
APAC–Australian Partnership for Advanced Computing (June 2003), http://nf.apac.edu.au/

Download references

Author information

Authors and Affiliations

Department of Computing, Curtin University of Technology,
Yudho Giri Sucahyo & Raj P. Gopalan
School of Information Systems, Curtin University of Technology, Kent St, 6102, Bentley, Western Australia
Amit Rudra

Authors

Yudho Giri Sucahyo
View author publications
You can also search for this author in PubMed Google Scholar
Raj P. Gopalan
View author publications
You can also search for this author in PubMed Google Scholar
Amit Rudra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Australian National University, ACT 0200, Acton, Australia
Tamás (Tom) Domonkos Gedeon
Murdoch University,
Lance Chun Che Fung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sucahyo, Y.G., Gopalan, R.P., Rudra, A. (2003). Efficiently Mining Frequent Patterns from Dense Datasets Using a Cluster of Computers. In: Gedeon, T.(.D., Fung, L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science(), vol 2903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24581-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-24581-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20646-0
Online ISBN: 978-3-540-24581-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Efficiently Mining Frequent Patterns from Dense Datasets Using a Cluster of Computers

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Data Partitioning for Fast Mining of Frequent Itemsets in Massively Distributed Environments

Data placement in massively distributed environments for fast parallel mining of frequent itemsets

Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficiently Mining Frequent Patterns from Dense Datasets Using a Cluster of Computers

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Data Partitioning for Fast Mining of Frequent Itemsets in Massively Distributed Environments

Data placement in massively distributed environments for fast parallel mining of frequent itemsets

Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation