Particle swarm optimizer for variable weighting in clustering high-dimensional data

Yanping Lu^1,2,
Shengrui Wang¹,
Shaozi Li² &
…
Changle Zhou²

1637 Accesses
55 Citations
Explore all metrics

Abstract

In this paper, we present a particle swarm optimizer (PSO) to solve the variable weighting problem in projected clustering of high-dimensional data. Many subspace clustering algorithms fail to yield good cluster quality because they do not employ an efficient search strategy. In this paper, we are interested in soft projected clustering. We design a suitable k-means objective weighting function, in which a change of variable weights is exponentially reflected. We also transform the original constrained variable weighting problem into a problem with bound constraints, using a normalized representation of variable weights, and we utilize a particle swarm optimizer to minimize the objective function in order to search for global optima to the variable weighting problem in clustering. Our experimental results on both synthetic and real data show that the proposed algorithm greatly improves cluster quality. In addition, the results of the new algorithm are much less dependent on the initial cluster centroids. In an application to text clustering, we show that the algorithm can be easily adapted to other similarity measures, such as the extended Jaccard coefficient for text data, and can be very effective.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (2005). Automatic subspace clustering of high dimensional data. Data Mining and Knowledge Discovery, 11(1), 5–33.
Article MathSciNet Google Scholar
Aitnouri, E., Wang, S., & Ziou, D. (2000). On comparison of clustering techniques for histogram pdf estimation. Pattern Recognition and Image Analysis, 10(2), 206–217.
Google Scholar
Boley, D., Gini, M., et al. (1999). Document categorization and query generation on the world wild web using WebACE. AI Review, 11, 365–391.
Google Scholar
Bouguessa, M., Wang, S., & Sun, H. (2006). An objective approach to cluster validation. Pattern Recognition Letters, 27(13), 1419–1430.
Article Google Scholar
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., & Papadopoulos, D. (2007). Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery Journal, 14, 63–97.
Article MathSciNet Google Scholar
Eberhart, R. C., & Kennedy, J. (1995). A new optimizer using particle swarm theory. In Proc. 6th international symposium on micromachine and human science, Japan (pp. 39–43).
Elke, A., Christian, B., et al. (2008). Detection and visualization of subspace cluster hierarchies. In LNCS (Vol. 4443, pp. 152–163). Berlin: Springer.
Google Scholar
Evett, I. W., & Spiehler, E. J. (1987). Rule induction in forensic science. Central Research Establishment. Home Office Forensic Science Service, Aldermaston, Reading, Berkshire RG7 4PN.
Goil, G. S., Nagesh, H., & Choudhary, A. (1999). Mafia: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University.
Han, E. H., Boley, D., et al. (1998). WebACE: A web agent for document categorization and exploration. In Proc. of 2nd international conf. on autonomous agents.
Handl, J., & Knowles, J. (2004). Multiobjective clustering with automatic determination of the number of clusters. Technical Report, UMIST, Department of Chemistry.
Huang, J. Z., Ng, M. K., Rong, H., & Li, Z. (2005). Automated dimension weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 1–12.
Article MATH Google Scholar
Jing, L., Ng, M. K., & Huang, J. Z. (2007). An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Transactions on Knowledge and Data Engineering, 19(8), 1026–1041.
Article Google Scholar
Kriegel, H.-P., Kroger, P., & Zimek, A. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery Data, 3(1), Article 1.
Liang, J. J., Qin, A. K., Suganthan, P. N., & Baskar, S. (2006). Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE Transactions on Evolutionary Computation, 10(3), 281–295.
Article Google Scholar
Lu, Y. (2009). Particle swarm optimizer: applications in high-dimensional data clustering. Ph.D. Dissertation, University of Sherbrooke, Department of Computer Science.
Makarenkov, V., & Legendre, P. (2001). Optimal variable weighting for ultrametric and additive trees and k-means partitioning: Methods and software. Journal of Classification, 18(2), 245–271.
MATH MathSciNet Google Scholar
Mangasarian, O. L., & Wolberg, W. H. (1990). Breast cancer diagnosis via linear programming. SIAM News, 23(5), 1–18.
Google Scholar
Moise, G., & Sander, J. (2008). Finding non-redundant, statistically significant regions in high dimensional data: A novel approach to projected and subspace clustering. In Proc. of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 533–541).
Moise, G., Sander, J., & Ester, M. (2008). Robust projected clustering. Knowledge and Information Systems, 14(3), 273–298.
Article MATH Google Scholar
Onwubolu, G. C., & Clerc, M. (2004). Optimal operating path for automated drilling operations by a new heuristic approach using particle swarm optimization. International Journal of Production Research, 42(3), 473–491.
Article MATH Google Scholar
Pang-ning, T., Michael, S., & Vipin, K. (2006). Introduction to data mining (p. 77). Upper Saddle River: Pearson Education.
Google Scholar
Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. SIGKDD Explorations Newsletter, 6, 90–105.
Article Google Scholar
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Google Scholar
Procopiuc, C. M., Jones, M., Agarwal, P. K., & Murali, T. M. (2002). A Monte Carlo algorithm for fast projective clustering. In Proc. of ACM SIGMOD international conference on management of data (pp. 418–427).
Salman, A., Ahmad, I., & Al-Madani, S. (2003). Particle swarm optimization for task assignment problem. Microprocessors and Microsystems, 26(8), 363–371.
Article Google Scholar
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.
Article Google Scholar
Tasgetiren, M. F., Sevkli, M., Liang, Y.-C., & Gencyilmaz, G. (2004). Particle swarm optimization algorithm for single machine total weighted tardiness problem. In Proc. of the 2004 congress on evolutionary computation (CEC’04), Portland, Oregon (pp. 1412–1419).
TREC (1999). Text retrieval conference, http://trec.nist.gov/.
Tibshirani, R., Walther, G., et al. (2000). Estimating the number of clusters in a dataset via the Gap Statistic. Technical Report, Stanford Univeristy.
Van den Bergh, F., & Engelbecht, A. P. (2000). Cooperative learning in neural networks using particle swarm optimizers. South African Computer Journal, 26, 84–90.
Google Scholar
van der Putten, P., & van Someren, M. (Eds.) (2000). CoIL challenge 2000: the insurance company case. Published by Sentient Machine Research, Amsterdam. Technical Report.
Woo, K.-G., Lee, J.-H., Kim, M.-H., & Lee, Y.-J. (2004). FINDIT: A fast and intelligent subspace clustering algorithm using dimension voting. Information and Software Technology, 46(4), 255–271.
Article Google Scholar
Yoshida, H., Kawata, K., Fukuyama, Y., & Nakanishi, Y. (2000). A particle swarm optimization for reactive power and voltage control considering voltage security assessment. IEEE Transactions on Power Systems, 15(4), 1232–1239.
Article Google Scholar
Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis. Technical Report, CS Dept., Univ. of Minnesota.
Zhou, X., Hu, X., Zhang, X., Lin, X., & Song, I.-Y. (2006). Context-sensitive semantic smoothing for the language modeling approach to genomic IR. In SIGIR’06.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Sherbrooke, 2500 Boul. de l’Universite, Sherbrooke, Quebec, J1K 2R1, Canada
Yanping Lu & Shengrui Wang
Department of Cognitive Science, Xiamen University, Xiamen, 361005, China
Yanping Lu, Shaozi Li & Changle Zhou

Authors

Yanping Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shengrui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shaozi Li
View author publications
You can also search for this author in PubMed Google Scholar
Changle Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanping Lu.

Additional information

Editors: D. Martens, B. Baesens, and T. Fawcett.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Y., Wang, S., Li, S. et al. Particle swarm optimizer for variable weighting in clustering high-dimensional data. Mach Learn 82, 43–70 (2011). https://doi.org/10.1007/s10994-009-5154-2

Download citation

Received: 30 June 2008
Revised: 20 August 2009
Accepted: 10 October 2009
Published: 13 November 2009
Issue Date: January 2011
DOI: https://doi.org/10.1007/s10994-009-5154-2

Particle swarm optimizer for variable weighting in clustering high-dimensional data

Abstract

Article PDF

Similar content being viewed by others

Feature selection with clustering probabilistic particle swarm optimization

Efficient Monte Carlo clustering in subspaces

A novel feature selection method based on adaptive search particle swarm optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Particle swarm optimizer for variable weighting in clustering high-dimensional data

Abstract

Article PDF

Similar content being viewed by others

Feature selection with clustering probabilistic particle swarm optimization

Efficient Monte Carlo clustering in subspaces

A novel feature selection method based on adaptive search particle swarm optimization

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation