[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3055635.3056603acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Text Document Clustering Using Memetic Feature Selection

Published: 24 February 2017 Publication History

Abstract

With the wide increase of the volume of electronic documents, it becomes inevitable the need to invent more sophisticated machine learning methods to manage the issue. In this paper, a Memetic feature selection technique is proposed to improve the k-means and the spherical k-means clustering algorithms. The proposed Memetic feature selection technique combines the wrapper inductive method with the filter ranking method. The internal and external clustering evaluation measures are used to assess the resulted clusters. The test results showed that after using the proposed hybrid method, the resulted clusters were more accurate and more compacted in comparison to the clusters resulted from using the GA-selected feature or using the entire feature space.

References

[1]
Forsati, R., et al., Efficient stochastic algorithms for document clustering. Information Sciences, 2013. 220: p. 269--291.
[2]
Geem, Z.W., J.H. Kim, and G. Loganathan, A new heuristic optimization algorithm: harmony search. Simulation, 2001. 76(2): p. 60--68.
[3]
Hornik, K., et al., Spherical k-means clustering. Journal of Statistical Software, 2012. 50(10): p. 1--22.
[4]
Strehl, A., J. Ghosh, and R. Mooney. Impact of similarity measures on web-page clustering. in Workshop on Artificial Intelligence for Web Search (AAAI 2000). 2000.
[5]
Forsati, R., et al. Hybridization of K-means and harmony search methods for web page clustering. in Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology-Volume 01. 2008. IEEE Computer Society.
[6]
da Cruz Nassif, L.F. and E.R. Hruschka, Document clustering for forensic analysis: An approach for improving computer inspection. IEEE transactions on information forensics and security, 2013. 8(1): p. 46--54.
[7]
Chiang, M.-C., C.-W. Tsai, and C.-S. Yang, A time-efficient pattern reduction algorithm for k-means clustering. Information Sciences, 2011. 181(4): p. 716--731.
[8]
Dai, X., Y. He, and Y. Sun. A Two-layer Text Clustering Approach for Retrospective News Event Detection. in Artificial Intelligence and Computational Intelligence (AICI), 2010 International Conference on. 2010. IEEE.
[9]
Shrivastava, S.K., J. Rana, and R. Jain, Text document clustering based on phrase similarity using affinity propagation. International Journal of Computer Applications, 2013.61(18).
[10]
Dueck, D., Affinity propagation: clustering data by passing messages. 2009, Citeseer.
[11]
Chim, H. and X. Deng, Efficient phrase-based document similarity for clustering. IEEE Transactions on Knowledge and Data Engineering, 2008. 20(9): p. 1217--1229.
[12]
Wang, K., et al., Adaptive affinity propagation clustering. arXiv preprint arXiv:0805.1096, 2008.
[13]
Chandrashekar, G. and F. Sahin, A survey on feature selection methods. Computers & Electrical Engineering, 2014. 40(1): p. 16--28.
[14]
Fragoudis, D., D. Meretakis, and S. Likothanassis, Best terms: an efficient feature-selection algorithm for text categorization. Knowledge and Information Systems, 2005. 8(1): p. 16--33.
[15]
Bolón-Canedo, V., N. Sánchez-Maroño, and A. Alonso-Betanzos, A review of feature selection methods on synthetic data. Knowledge and information systems, 2013. 34(3): p. 483--519.
[16]
Rehman, A., et al., Relative discrimination criterion--a novel feature ranking method for text data. Expert Systems with Applications, 2015. 42(7): p. 3670--3681.
[17]
Huang, J., Y. Cai, and X. Xu, A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recognition Letters, 2007. 28(13): p. 1825--1844.
[18]
Hua, J., W.D. Tembe, and E.R. Dougherty, Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition, 2009. 42(3): p. 409--424.
[19]
Onan, A. and S. Korukoğlu, A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 2015: p. 0165551515613226.
[20]
Unler, A., A. Murat, and R.B. Chinnam, mr 2 PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Information Sciences, 2011. 181(20): p. 4625--4641.
[21]
Lee, J. and D.-W. Kim, Memetic feature selection algorithm for multi-label classification. Information Sciences, 2015. 293: p. 80--96.
[22]
Moscato, P., On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. Caltech concurrent computation program, C3P Report, 1989. 826: p. 1989.
[23]
Ong, Y.-S., et al., Classification of adaptive memetic algorithms: a comparative study. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 2006. 36(1): p. 141--152.
[24]
Hartigan, J.A. and MA. Wong, Algorithm AS 136: A k-means clustering algorithm. Applied statistics, 1979: p. 100--108.
[25]
Dhillon, I.S., J. Fan, and Y. Guan, Efficient clustering of very large document collections, in Data mining for scientific and engineering applications. 2001, Springer. p. 357--381.
[26]
Debole, F. and F. Sebastiani, An analysis of the relative hardness of Reuters-21578 subsets. Journal of the American Society for Information Science and technology, 2005. 56(6): p. 584--596.
[27]
Fodeh, S., B. Punch, and P.-N. Tan, On ontology-driven document clustering using core semantic features. Knowledge and information systems, 2011. 28(2): p. 395--421.

Cited By

View all
  • (2020)Unsupervised Text Feature Selection Using Memetic Dichotomous Differential EvolutionAlgorithms10.3390/a1306013113:6(131)Online publication date: 26-May-2020
  • (2020)An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering ProblemCurrent Medical Imaging Formerly Current Medical Imaging Reviews10.2174/157340561466618090311254116:4(296-306)Online publication date: 7-May-2020
  • (2019)An Enhanced Method for Topic Modeling using Concept-Latent2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon)10.1109/COMITCon.2019.8862179(485-490)Online publication date: Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLC '17: Proceedings of the 9th International Conference on Machine Learning and Computing
February 2017
545 pages
ISBN:9781450348171
DOI:10.1145/3055635
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Feature selection
  2. genetic
  3. memetic
  4. optimization
  5. wrapper

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMLC 2017

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Unsupervised Text Feature Selection Using Memetic Dichotomous Differential EvolutionAlgorithms10.3390/a1306013113:6(131)Online publication date: 26-May-2020
  • (2020)An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering ProblemCurrent Medical Imaging Formerly Current Medical Imaging Reviews10.2174/157340561466618090311254116:4(296-306)Online publication date: 7-May-2020
  • (2019)An Enhanced Method for Topic Modeling using Concept-Latent2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon)10.1109/COMITCon.2019.8862179(485-490)Online publication date: Feb-2019
  • (2019)Memetic Algorithms for Business Analytics and Data Science: A Brief SurveyBusiness and Consumer Analytics: New Ideas10.1007/978-3-030-06222-4_13(545-608)Online publication date: 31-May-2019
  • (2017)Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature SelectionMulti-disciplinary Trends in Artificial Intelligence10.1007/978-3-319-69456-6_23(281-289)Online publication date: 19-Oct-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media