Abstract
The development of micro-blog, generating large-scale short texts, provides people with convenient communication. In the meantime, discovering topics from short texts genuinely becomes an intractable problem. It was hard for traditional topic model-to-model short texts, such as probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA). They suffered from the severe data sparsity when disposed short texts. Moreover, K-means clustering algorithm can make topics discriminative when datasets is intensive and the difference among topic documents is distinct. In this paper, BTM topic model is employed to process short texts–micro-blog data for alleviating the problem of sparsity. At the same time, we integrating K-means clustering algorithm into BTM (Biterm Topic Model) for topics discovery further. The results of experiments on Sina micro-blog short text collections demonstrate that our method can discover topics effectively.
Similar content being viewed by others
References
Huang, S.Q., Yang, Y.T., Li, H.K., and Sun, G.Z., Topic detection from microblog based on text clustering and topic model analysis, IEEE Asia-Pac. Serv. Comput. Conf., 2014, no. 12, pp. 88–92.
Hofmann, T., Probabilistic latent semantic indexing, Proc. SIGIR, 1999, pp. 50–57.
Blei, D., Ng, A., and Jordan, M., Latent Dirichlet allocation, J. Mach. Learn. Res., 2003, vol. 3, pp. 993–1022.
Yan, X.H., Guo, J.F., Lan, Y.Y., and Cheng, X.Q., A biterm topic model for short texts, Int. Conf. World Wide Web, 2013, no. 5, pp. 1445–1456.
Liu, S.B. and Liu, L., Combining parametric and nonparametric topic model to discover microblog event, IEEE Inf. Sci. Electron. Electr. Eng. (ISEEE), 2014, vol. 3, pp. 1527–1531.
Wang, Y.Y., Wang, L., Qi, J., et al., Improved text clustering algorithm and application in microblogging public opinion analysis, IEEE Fourth World Congress on Software Engineering, 2013, pp. 27–31.
Lu, R., Xiang, L., Liu, M.R., and Yang, Q., Discovering news topics from micro-blogs based on hidden topics analysis and text clustering, Pattern Recognit. Artif. Intell., 2012, vol. 3, pp. 382–387.
Xiong, Z.T., Clustering algorithm research in micro-blog short text based on sparse feature, Software Guide, 2014, vol. 13, pp. 133–135.
Xie, H. and Jiang, H., Improved LDA model for micro-blog topic mining, J. East China Nornal Univ. (Nat. Sci.), 2013.
Qi, X.Q. and Jing, X.J., The improvement of LDA applying in micro-blog, Sci. Pap. Online, 2012.
Ramage, D., Dumail, S.T., and Liebling, D.J., Characterizing micro-blogs with topic model, 4th International AAAI Conference on Weblogs and Social Media (ICWSM), 2010, pp. 130–137.
Huang, T., Peng, D.L., and Cao, L.D., Discovering communities with self-adaptive k clustering in micro-blog data, IEEE Second International Conference on Cloud and Green Computing, 2012, pp. 383–390.
Sun, S.P., Research on Chinese Micro-Blog Hot Topic Detection and Tracking, Beijing Jiaotong University, 2011.
Mi, W.L. and Sun, Y.X., Microblog hot topics discovery method based on probabilistic topic model, Comput. Syst. Appl., 2014.
Zheng, L., Research and Application of Topic Detection on Micro-Blog, Harbin Institute of Technology, 2012.
Han, J.W. and Kamber, M., Data Mining: Concepts and Techniques [M], 2007, pp. 263–266.
Author information
Authors and Affiliations
Corresponding author
Additional information
The article is published in the original.
About this article
Cite this article
Li, W., Feng, Y., Li, D. et al. Micro-blog topic detection method based on BTM topic model and K-means clustering algorithm. Aut. Control Comp. Sci. 50, 271–277 (2016). https://doi.org/10.3103/S0146411616040040
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411616040040