Abstract
Efficient mining of frequent patterns from large databases has been an active area of research since it is the most expensive step in association rules mining. In this paper, we present an algorithm for finding complete frequent patterns from very large dense datasets in a cluster environment. The data needs to be distributed to the nodes of the cluster only once and the mining can be performed in parallel many times with different parameter settings for minimum support. The algorithm is based on a master-slave scheme where a coordinator controls the data parallel programs running on a number of nodes of the cluster. The parallel program was executed on a cluster of Alpha SMPs. The performance of the algorithm was studied on small and large dense datasets. We report the results of the experiments that show both speed up and scale up of our algorithm along with our conclusions and pointers for further work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zaki, M.J.: Parallel and distributed association mining: A survey. IEEE Concurrency (Special Issue on Data Mining), 14–25 (October/December 1999)
Baker, M., Buyya, R.: Cluster Computing: The Commodity Supercomputing. Software-Practice and Experience 1(1), 1–26 (1999)
Jin, R., Agrawal, G.: An Efficient Association Mining Implementation of Cluster of SMPs. In: Proc. of workshop on Parallel and Distributed Data Mining, (PDDM) (2001)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-pattern Tree Approach. To appear in Data Mining and Knowledge Discovery: An International Journal, Kluwer Academic Publishers (2003)
Gopalan, R.P., Sucahyo, Y.G.: Improving the Efficiency of Frequent Pattern Mining by Compact Data Structure Design. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, Springer, Heidelberg (2003)
Liu, J., Pan, Y., Wang, K., Han, J.: Mining Frequent Item Sets by Opportunistic Projection. In: Proceedings of ACM SIGKDD, Edmonton, Alberta, Canada (2002)
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. of ACM SIGMOD, Washington, DC (1993)
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface, 2nd edn. MIT Press, Cambridge (1999)
APAC–Australian Partnership for Advanced Computing (June 2003), http://nf.apac.edu.au/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sucahyo, Y.G., Gopalan, R.P., Rudra, A. (2003). Efficiently Mining Frequent Patterns from Dense Datasets Using a Cluster of Computers. In: Gedeon, T.(.D., Fung, L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science(), vol 2903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24581-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-24581-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20646-0
Online ISBN: 978-3-540-24581-0
eBook Packages: Springer Book Archive