Abstract
PC cluster is recently regarded as one of the most promising platforms for heavy data intensive applications, such as decision support query processing and data mining. We proposed some new parallel algorithms to mine association rule and generalized association rule with taxonomy and showed that PC cluster can handle large scale mining with them. During development of high performance parallel mining system on PC cluster, we found that heterogeneity is inevitable to take the advantage of rapid progress of PC hardware. However we can not naively apply existing parallel algorithms since they assume homogeneity. We proposed the new dynamic load balancing methods for association rule mining, which works under heterogeneous system. Two strategies, called candidate migration and transaction migration are proposed. Initially first one is invoked. When the load imbalance cannot be resolved with the first method, the second one is employed, which is costly but more effective for strong imbalance. The experimental results confirm that the proposed approach can very effectively balance the workload among heterogeneous PCs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. W. Cheung, J. Han, V. T. Ng, A. W. Fu, and Y. Fu. “A Fast Distributed Algorithms for Mining Association Rules.” In Proc. of PDIS, pp. 31–42, Dec. 1996.
H. M. Dewan, M. A. Hernandez, K. W. Mok, S.J. Stolfo “Predictive Dynamic Load Balancing of Parallel Hash-Joins Over Heterogeneous Processors in the Presence of Data Skew.” In Proc. of PDIS, pp. 40–49, 1994.
D. DeWitt and J. Gray “Parallel Database Systems: The Future of High Performance Database Systems.” In Communications of the ACM, Vol. 35, No. 6, pp. 85–98, Jun. 1992.
E.-H. Han and G. Karypis and Vipin Kumar ”Scalable Parallel Data Mining for Association Rules.” In Proc. of SIGMOD, pp. 277–288, May. 1997
M. Tamura, M. Kitsuregawa. ”Dynamic Load Balancing for Parallel Association Rule Mining on Heterogeneous PC Cluster System”. In Proc. of VLDB, 1999.
M. Kitsuregawa, T. Tamura, M. Oguchi “Parallel Database Processing/Data Mining on Large Scale ATM Connected PC Cluster.” In Euro-PDS, pp. 313–320, Jun. 1997
M. J. Zaki, S. Parthasarathy, M. Ogihara and W. Li “Parallel Algorithms for Discovery of Association Rules”. Data Mining and Knowledge Discovery, Dec. 1997.
J. S. Park, M.-S. Chen, P. S. Yu ”Efficient Parallel Algorithms for Mining Association Rules” In Proc. of CIKM, pp. 31–36, Nov. 1995
R. Agrawal and R. Srikant. ”Fast Algorithms for Mining Association Rules”. In Proc. of VLDB, pp. 487–499, Sep. 1994.
R. Agrawal and J. C. Shafer. “Parallel Mining of Associaton Rules”. In IEEE TKDE, Vol. 8, No. 6, pp. 962–969, Dec. 1996.
R. Srikant, R. Agrawal. ”Mining Generalized Association Rules”. In Proc. of VLDB, 1995.
S. Parthasarathy and M. J. Zaki and W. Li “Memory Placement Techniques for Parallel Association Mining.” In Proc. of KDD, pp. 304–308, Aug. 1998
T. Shintani, M. Oguchi, M. Kitsuregawa. ”Performance Analysis for Parallel Generalized Association Rule Mining on a Large Scale PC Cluster”. In Proc. of Euro-par, 1999.
T. Shintani and M. Kitsuregawa “Hash Based Parallel Algorithms for Mining Association Rules”. In Proc. of PDIS, pp. 19–30, Dec. 1996.
T. Shintani, M. Kitsuregawa “Parallel Mining Algorithms for Generalized Association Rules with Classification Hierarchy.” In Proc. of SIGMOD, pp. 25–36, 1998.
T. Tamura, M. Oguchi, M. Kitsuregawa “Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining.” In Super Computing 97::High Performance Networking and Computing, 1997
Y. Xiao and D. W. Cheung “Effect of Data Skewness in Parallel Data Mining of Association Rules”. In Proc. of PAKDD, pp. 48–60, Apr. 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kitsuregawa, M., Shintani, T., Tamura, M., Pramudiono, I. (2000). Parallel Data Mining on Large Scale PC Cluster. In: Lu, H., Zhou, A. (eds) Web-Age Information Management. WAIM 2000. Lecture Notes in Computer Science, vol 1846. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45151-X_2
Download citation
DOI: https://doi.org/10.1007/3-540-45151-X_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67627-0
Online ISBN: 978-3-540-45151-8
eBook Packages: Springer Book Archive