Abstract
In this paper, we propose a global-to-local searching-based Binary Particle Swarm Optimization (GSBPSO) based on Binary Particle Swarm Optimization (BPSO). The GSBPSO, which enables the particle swarm algorithm to have strong global search capability in the early stage of the algorithm and strong local search capability in the late stage of the algorithm. For text clustering, this paper first uses document frequency for feature coarse selection, then GSBPSO algorithm for feature reselection to further reduce feature redundancy, and finally uses Spherical K-means (SKM) algorithm for final clustering of text. The simulation experiments of Chinese text clustering algorithm based on GSBPSO particle swarm algorithm and SKM using Chinese dataset from Fudan University show that the GSBPSO algorithm can compress the high-dimensional and sparse text feature matrix with a compression ratio of 47%. By clustering the text matrices before and after feature selection separately, the experiments show that the F-value and NMI values of the clustering algorithm are improved to different degrees on the dataset after feature reselection by GSBPSO algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Insu, C., Chang, K.W.: Detecting and analyzing politically-themed stocks using text mining techniques and transfer entropy—focus on the Republic of Korea’s case. Entropy 23(6), 734 (2021)
Xiali, T., Ying, X.: Text data clustering algorithm incorporating new feature selection mechanism. Comput. Eng. Design 42(03), 734–741 (2021)
Li, L., et al.: Document image classification: progress over two decades. Neurocomputing 453, 223–240 (2021)
Sihui, W., Shiping, C.: Self-attention-based Bi-LSTM with TFIDF for spam SMS recognition. Comput. Syst. Appl. 29(09), 171–177 (2020)
Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory: MHS'95. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science (1995)
Jianhua, L., Ronghua, Y., Shuihua, S.: Analysis of discrete binary particle swarm optimization. J. Nanjing University (Natural Science) 47(5), 504–513 (2011)
Wenhua, D., Cuizhen, J., Tingting, H.: Research on text feature extraction method based on hybrid parallel genetic clustering. Comput. Sci. 9, 187–190 (2008)
Liu, J.: A text retrieval method and validity verification based on kmeans clustering algorithm and LDA topic model. Inf. Sci. 35(02), 16–21+26 (2017)
Ibrahim, C., et al.: Two stages K-means and PSO-based method for optimal allocation of multiple parallel DRPs application & deployment. IET Smart Grid 3(2), 216–225 (2020)
Dhillo, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001)
Banerjee, A., Dhillon, I., Ghosh, J., et al.: Generative Model-based Clustering of Directional Data: Conference on Knowledge Discovery in Data (2003)
Acknowledgments
This work is supported by the Natural Science Foundation of Guangdong Province of China with the Grant No.2020A1515010784, Key-Area Research and Development Program of Guangdong Province with No. 2019B020219003.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, K. et al. (2022). Chinese Text Clustering Algorithm Based on Multi-agent Optimization System. In: Li, K., Liu, Y., Wang, W. (eds) Exploration of Novel Intelligent Optimization Algorithms. ISICA 2021. Communications in Computer and Information Science, vol 1590. Springer, Singapore. https://doi.org/10.1007/978-981-19-4109-2_28
Download citation
DOI: https://doi.org/10.1007/978-981-19-4109-2_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4108-5
Online ISBN: 978-981-19-4109-2
eBook Packages: Computer ScienceComputer Science (R0)