Abstract
Clustering analysis plays an important role in the filed of data mining. Nowadays, hierarchical clustering technique is becoming one of the most widely used clustering techniques. However, for most algorithms of hierarchical clustering technique, the requirements of high execution efficiency and high accuracy of clustering result cannot be met at the same time. After analyzing the advantages and disadvantages of the hierarchical algorithms, the paper puts forward a two-stage clustering algorithm, named Chameleon Based on Clustering Feature Tree (CBCFT), which hybridizes the Clustering Tree of algorithm BIRCH with algorithm CHAMELEON. By calculating the time complexity of CBCFT, the paper argues that the time complexity of CBCFT increases linearly with the number of data. By experimenting on sample data set, this paper demonstrates that CBCFT is able to identify clusters with large variance in size and shape and is robust to outliers. Moreover, the result of CBCFT is as similar as that of CHAMELEON, but CBCFT overcomes the shortcoming of the low execution efficiency of CHAMELEON. Although the execution time of CBCFT is longer than BIRCH, the clustering result of CBCFT is much satisfactory than that of BIRCH. Finally, through a case of customer segmentation of Chinese Petroleum Corp. HUBEI branch; the paper demonstrates that the clustering result of the case is meaningful and useful.
Similar content being viewed by others
References
Ankerst, M., Breunig, M., & Kriegel, H. P. Sander, J. (1999). OPICS: Ordering points to identify the clustering structure. In Proceedings of 1999 ACM-SIGMOD international conference on management of data (SIGMOD’99), Philadelphia, June 1999 (pp. 49–60).
Chen, M. S., Han, J., & Yu, P. S. (1996). Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866–883.
Chicc, G., Napoli, R. C., & Piglione, F. (2006). Comparisons among clustering techniques for electricity customer classification. IEEE Transactions on Power Systems, 21(2).
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the second international conference of knowledge discovery and data mining (pp. 226–231).
Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3, 209–226.
Guha, S., Rastogi, R., & Shim, K. (1998) CURE: An efficient clustering algorithm for large databases. In Proceedings of ACM–SIGMOD international conference on management of data (pp. 73–84).
Guha, S., Rastogi, R., & Shim, K. (1999). ROCK: a robust clustering algorithm for categorical attributes. In Proceedings of the 15th international conference on data engineering (p. 512).
Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). Clustering algorithms and validity measures. In Proceedings of the thirteenth international conference on scientific and statistical database management (Vol. 3, pp. 1099–3371).
Han, J., & Kamber, M. (2001). Data mining concepts and techniques. BeiJing: Higher Education Press.
Hinneburg, A., & Keim, D. A. (1998). An efficient approach to clustering in large multimedia databases with noise. In Proc. of 1998 int. conf. on knowledge discovery and data mining (KDD’98), New York, August 1998 (pp. 58–65).
Hu, T. L., & Sheu, J. B. (2003). A fuzzy-based customer classification method for demand-responsive logistical distribution operations. Fuzzy Sets and Systems, 139, 431–450.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys (CSUR), 31(3), 264–323.
Jung, S. Y., & Kim, T. S. (2001). An agglomerative hierarchical clustering using partial maximum array and incremental similarity computation method. In First IEEE international conference on data mining (p. 265).
Karypis, G., Han, E. H., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer, 32(8), 68–75 (Special Issue on Data Analysis and Mining).
Karypis, G., & Kumar, V. (1998). hMETIS 1.5: A hypergraph partitioning package. Technical report, Department of Computer Science, University of Minnesota. http://www.cs.umn.edu/~metis.
Lewis, J., & Chase, J. (2004). Data structures. Java edition. Englewood Cliffs: Prentice-Hall.
Li, C., Becerra, V. M., & Deng, J. (2004). Extension of fuzzy c-means algorithm. In Proceedings of the IEEE conference on cybernetics and intelligent systems, Singapore, 1–3 December 2004.
Lin, C. R., & Chen, M. S. (2005). Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Transactions on Knowledge and Date Engineering, 17(2).
Ng, R., & Han, J. (1994). Efficient and effective clustering method for spatial data mining. In Proceedings of international conference of very large data bases (VLDB’94), Santiago, Chile, September 1994 (pp. 144–155).
Olson, D. L., & Zhao, F. (2007). CIOs’ perspectives of critical success factors in ERP upgrade projects. Enterprise Information Systems, 1(1), 129–138.
Palomino, M. A., & Whitley, E. A. (2007). The effects of national culture on ERP implementation: a study of Colombia and Switzerland. Enterprise Information Systems, 3(1), 301–325.
Qian, Y. T., Shi, Q. S., & Wang, Q. (2002). CURE-NS: A hierarchical clustering algorithm with new shrinking scheme. In Proceedings of the first international conference on machine learning and cybernetics, Beijing, 4–5 November 2002.
Sheikholeslami, G., Chatterjee, S., & Zhang, A. (1998). WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proceedings of the IEEE conference on very large data bases (VLDB’98), New York, August 1998 (pp. 428–439).
Wan, L. H., Li, Y. J., Liu, W. Y., & Zha, D. Y. (2005). Application and study of spatial cluster and customer partitioning. In Proceedings of the fourth international conference on machine learning and cybernetics, Guangzhou, 18–21 August 2005.
Wang, W., Yang, J., & Muntz, R. (1997). STING: A statistical information grid approach to spatial data mining. In Proceedings of 1997 international conference on very large data bases (VLDB’97), Athens, Greece, August 1997 (pp. 186–195).
Wang, Z., He, P. L., Guo, L. S., & Zheng, X. S. (2004). Clustering analysis of customer relationship in securities trade. In Proceedings of the third international conference on machine learning and cybernetics, Shanghai, August 2004 (pp. 26–29).
Warfield, J. N. (2007). Systems science serves enterprise integration: a tutorial. Enterprise Information Systems, 2(1), 235–254.
Weiss, M. A. (2004). Data structures and algorithm analysis in Java. Englewood Cliffs: Prentice-Hall.
Zhang, T., Ramakrishnan, R., & Linvy, M. (1996). Birch: an efficient data clustering method for large databases. In Proc. of 1996 ACM–SIGMOD international conference on management of data, Montreal, Quebec (pp. 103–114).
Author information
Authors and Affiliations
Corresponding author
Additional information
The research is partially supported by National Natural Science Foundation of China (grants #70372049 and #70121001).
Rights and permissions
About this article
Cite this article
Li, J., Wang, K. & Xu, L. Chameleon based on clustering feature tree and its application in customer segmentation. Ann Oper Res 168, 225–245 (2009). https://doi.org/10.1007/s10479-008-0368-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-008-0368-4