Abstract
In Chinese, a word is usually composed of several characters, the semantic meaning of a word is related to its composing characters and contexts. Previous studies have shown that modeling the characters can benefit learning word embeddings, however, they ignore the external context characters. In this paper, we propose a novel Chinese word embeddings model which considers both internal characters and external context characters. In this way, isolated characters have more relevance and character embeddings contain more semantic information. Therefore, the effectiveness of Chinese word embeddings is improved. Experimental results show that our model outperforms other word embeddings methods on word relatedness computation, analogical reasoning and text classification tasks, and our model is empirically robust to the proportion of character modeling and corpora size.
This work was supported by NSFC (No. 61632019) and 863 project of China (No. 2015AA015403).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL (1), pp. 238–247 (2014)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: ICML, pp. 1899–1907 (2014)
Chen, T., Xu, R., He, Y., Wang, X.: Improving distributed representation of word sense via wordnet gloss composition and context clustering. Association for Computational Linguistics (2015)
Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: EMNLP, pp. 1025–1035. Citeseer (2014)
Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.B.: Joint learning of character and word embeddings. In: IJCAI, pp. 1236–1242 (2015)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Cotterell, R., Schütze, H., Eisner, J.: Morphological smoothing and extrapolation of word embeddings. In: Meeting of the Association for Computational Linguistics, pp. 1651–1660 (2016)
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Computer Science (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: ACL (1), pp. 455–465 (2013)
Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Inside out: two jointly predictive models for word representations and phrase representations. In: AAAI, pp. 2821–2827 (2016)
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics (2010)
Zhao, Y., Liu, Z., Sun, M.: Phrase type sensitive tensor indexing model for semantic composition. In: AAAI, pp. 2195–2202 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhang, X., Liu, S., Li, Y., Liang, W. (2017). Joining External Context Characters to Improve Chinese Word Embedding. In: Cong, F., Leung, A., Wei, Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017. Lecture Notes in Computer Science(), vol 10262. Springer, Cham. https://doi.org/10.1007/978-3-319-59081-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-59081-3_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59080-6
Online ISBN: 978-3-319-59081-3
eBook Packages: Computer ScienceComputer Science (R0)