Abstract
Short text representation (STR) has attracted increasing interests recently with the rapid growth of Web and social media data existing in short text form. In this paper, we present a new method using an improved semantic feature space mapping to effectively represent short texts. Firstly, semantic clustering of terms is performed based on statistical analysis and word2vec, and the semantic feature space can then be represented via the cluster center. Then, the context information of terms is integrated with the semantic feature space, based on which three improved similarity calculation methods are established. Thereafter the text mapping matrix is constructed for short text representation learning. Experiments on both Chinese and English test collections show that the proposed method can well reflect the semantic information of short texts and represent the short texts reasonably and effectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lu, H.Y., Xie, L.Y., Kang, N, et al.: Don’t forget the quantifiable relationship between words: using recurrent neural network for short text topic discovery. In: AAAI 2017, pp. 1192–1198 (2017)
Piao, G.Y, Breslin, J.G.: User modeling on Twitter with WordNet Synsets and DBpedia concepts for personalized recommendations. In: CIKM 2016, pp. 2057–2060 (2016)
Li, P., Wang, H., Zhu, K.Q., et al.: A large probabilistic semantic network based approach to compute term similarity. IEEE Trans. Knowl. Data Eng. 27(10), 2604–2617 (2015)
Kumar, S., Rengarajan, P., Annie, A.X.: Using Wikipedia category network to generate topic trees. In: AAAI 2017, pp. 4951–4952 (2017)
Shen, J., Wu, Z., Lei, D., Shang, J., Ren, X., Han, J.: SetExpan: corpus-based set expansion via context feature selection and rank ensemble. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10534, pp. 288–304. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71249-9_18
Wang, D.Z.: Archimedes: efficient query processing over probabilistic knowledge bases. ACM SIGMOD 46(2), 30–35 (2017)
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, pp. 3111–3119 (2013)
Jiang, H.D., Turki, T., Wang J.T.L.: Reverse engineering regulatory networks in cells using a dynamic bayesian network and mutual information scoring function. In: ICMLA 2017, pp. 761–764 (2017)
Amagata, D., Hara, T.: Mining top-k co-occurrence patterns across multiple streams. IEEE Trans. Knowl. Data Eng. 29(10), 2249–2262 (2017)
Ma, H.F., Xing, Y., Wang, S., et al.: Leveraging term co-occurrence distance and strong classification features for short text feature selection. In: KSEM 2017, pp. 67–75(2017)
Song, H., Lee, J.G., Han, W.S.: PAMAE: parallel k-medoids clustering with high accuracy and efficiency. In: KDD 2017, pp. 1087–1096 (2017)
DBLP Dataset [EB/OL], 20 Apr 2016. http://dblp.uni-trier.de/xml/
ICTCLAS, ICTCLAS2012-SDK-0101, rar[EB/OL] (2016). http://www.nlpir.org/download/
Ali, C.M., Khalid, S., Aslam, M.H.: Pattern based comprehensive urdu stemmer and short text classification. IEEE Access 6, 7374–7389 (2018)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 61762078, No. 61363058, No. 61663004, No. kx201705) and Guangxi Key Lab of Multi-source Information Mining and Security (No. MIMS18-08).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tuo, T., Ma, H., Liu, H., Wei, J. (2019). Effectively Representing Short Text via the Improved Semantic Feature Space Mapping. In: U., L., Lauw, H. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11607. Springer, Cham. https://doi.org/10.1007/978-3-030-26142-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-26142-9_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26141-2
Online ISBN: 978-3-030-26142-9
eBook Packages: Computer ScienceComputer Science (R0)