Abstract
Textual data frequently occurs as an unlabeled document collection, therefore it is useful to sort this collection into clusters of related documents. On the other hand, text has different aspects, which a single representation cannot capture. To this end, multi-view clustering present an efficient solution to integrate different representations called “views” by exploiting the complementary characteristics of these views. However, the existing methods consider only one representation mode for all views that is based on terms frequencies. Such representation leads to losing valuable information and fails to capture the semantic aspect of text. To overcome these issues, we propose a new method for multi-view text clustering that exploits different representations of text in order to improve the quality of clustering. The experimental results show that the proposed method outperforms other methods and enhances the clustering quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C., Zhai, C.: Mining Text Data. Springer, New York (2012). https://doi.org/10.1007/978-1-4614-3223-4
Amini, M., Usunier, N., Goutte, C.: Learning from multiple partially observed views-an application to multilingual text categorization. In: Advances in Neural Information Processing Systems, pp. 28–36 (2009)
Ben N’Cir, C.E., Essoussi, N.: Using sequences of words for non-disjoint grouping of documents. Int. J. Pattern Recognit Artif Intell. 29(03), 1550013 (2015)
Bickel, S., Scheffer, T.: Multi-view clustering. In: ICDM, vol. 4, pp. 19–26 (2004)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Bolstad, W.M.: Understanding Computational Bayesian Statistics, vol. 644. Wiley, New York (2010)
Chao, G., Sun, S., Bi, J.: A survey on multi-view clustering. arXiv preprint arXiv:1712.06246 (2017)
Ding, Z., Fu, Y.: Low-rank common subspace for multi-view learning. In: 2014 IEEE International Conference on Data Mining, pp. 110–119. IEEE (2014)
Fraj, M., Hajkacem, M.A.B., Essoussi, N.: A novel tweets clustering method using word embeddings. In: 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA), pp. 1–7. IEEE (2018)
Guo, Y.: Convex subspace representation learning from multi-view data. In: AAAI, vol. 1, p. 2 (2013)
Hassan, M.T., Karim, A., Kim, J.B., Jeon, M.: CDIM: document clustering by discrimination information maximization. Inf. Sci. 316, 87–106 (2015)
Hussain, S.F., Mushtaq, M., Halim, Z.: Multi-view document clustering via ensemble method. J. Intell. Inf. Syst. 43(1), 81–99 (2014)
Jun, S., Park, S.S., Jang, D.S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst. Appl. 41(7), 3204–3212 (2014)
Kalogeratos, A., Likas, A.: Document clustering using synthetic cluster prototypes. Data Knowl. Eng. 70(3), 284–306 (2011)
Kumar, A., Daumé, H.: A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 393–400 (2011)
Kumar, V., Minz, S.: Multi-view ensemble learning: an optimal feature set partitioning for high-dimensional data classification. Knowl. Inf. Syst. 49(1), 1–59 (2016)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22. Citeseer (1999)
Liu, J., Wang, C., Gao, J., Han, J.: Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 252–260. SIAM (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Nie, F., Cai, G., Li, X.: Multi-view clustering and semi-supervised classification with adaptive neighbours. In: AAAI, pp. 2408–2414 (2017)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Sun, S.: A survey of multi-view machine learning. Neural Comput. Appl. 23(7–8), 2031–2038 (2013)
Tagarelli, A., Karypis, G.: A segment-based approach to clustering multi-topic documents. Knowl. Inf. Syst. 34(3), 563–595 (2013)
Tao, Z., Liu, H., Li, S., Ding, Z., Fu, Y.: From ensemble clustering to multi-view clustering. In: IJCAI (2017)
Wan, X.: Co-training for cross-lingual sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pp. 235–243. Association for Computational Linguistics (2009)
Wei, B., Pal, C.: Cross lingual adaptation: an experiment on sentiment classifications. In: Proceedings of the ACL 2010 Conference Short Papers, pp. 258–262. Association for Computational Linguistics (2010)
Xie, X., Sun, S.: Multi-view clustering ensembles. In: 2013 International Conference on Machine Learning and Cybernetics (ICMLC), vol. 1, pp. 51–56. IEEE (2013)
Xu, Z., Sun, S.: An algorithm on multi-view adaboost. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6443, pp. 355–362. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17537-4_44
Yang, Y., Wang, H.: Multi-view clustering: a survey. Big Data Min. Anal. 1(2), 83–107 (2018)
Yin, Q., Wu, S., He, R., Wang, L.: Multi-view clustering via pairwise sparse subspace representation. Neurocomputing 156, 12–21 (2015)
Yin, Q., Wu, S., Wang, L.: Unified subspace learning for incomplete and unlabeled multi-view data. Pattern Recogn. 67, 313–327 (2017)
Zhao, H., Ding, Z., Fu, Y.: Multi-view clustering via deep matrix factorization. In: AAAI, pp. 2921–2927 (2017)
Zhao, L., Chen, Z., Yang, Y., Wang, Z.J., Leung, V.C.: Incomplete multi-view clustering via deep semantic mapping. Neurocomputing 275, 1053–1062 (2018)
Zhao, X., Evans, N., Dugelay, J.L.: A subspace co-training framework for multi-view clustering. Pattern Recogn. Lett. 41, 73–82 (2014)
Zheng, L., Li, T., Ding, C.: Hierarchical ensemble clustering. In: 2010 IEEE International Conference on Data Mining, pp. 1199–1204. IEEE (2010)
Zhuang, F., Karypis, G., Ning, X., He, Q., Shi, Z.: Multi-view learning via probabilistic latent semantic analysis. Inf. Sci. 199, 20–30 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Fraj, M., Ben Hajkacem, M.A., Essoussi, N. (2019). Ensemble Method for Multi-view Text Clustering. In: Nguyen, N., Chbeir, R., Exposito, E., Aniorté, P., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2019. Lecture Notes in Computer Science(), vol 11683. Springer, Cham. https://doi.org/10.1007/978-3-030-28377-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-28377-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28376-6
Online ISBN: 978-3-030-28377-3
eBook Packages: Computer ScienceComputer Science (R0)