Joining External Context Characters to Improve Chinese Word Embedding

Xianchao Zhang¹⁶,
Shike Liu¹⁶,
Yuangang Li^17,18 &
…
Wenxin Liang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10262))

Included in the following conference series:

International Symposium on Neural Networks

2911 Accesses

Abstract

In Chinese, a word is usually composed of several characters, the semantic meaning of a word is related to its composing characters and contexts. Previous studies have shown that modeling the characters can benefit learning word embeddings, however, they ignore the external context characters. In this paper, we propose a novel Chinese word embeddings model which considers both internal characters and external context characters. In this way, isolated characters have more relevance and character embeddings contain more semantic information. Therefore, the effectiveness of Chinese word embeddings is improved. Experimental results show that our model outperforms other word embeddings methods on word relatedness computation, analogical reasoning and text classification tasks, and our model is empirically robust to the proportion of character modeling and corpora size.

This work was supported by NSFC (No. 61632019) and 863 project of China (No. 2015AA015403).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Chinese word embeddings from semantic and phonetic components

Article 10 August 2022

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Article 04 July 2020

Pronunciation-Enhanced Chinese Word Embedding

Article Open access 22 February 2021

Notes

References

Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL (1), pp. 238–247 (2014)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: ICML, pp. 1899–1907 (2014)
Google Scholar
Chen, T., Xu, R., He, Y., Wang, X.: Improving distributed representation of word sense via wordnet gloss composition and context clustering. Association for Computational Linguistics (2015)
Google Scholar
Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: EMNLP, pp. 1025–1035. Citeseer (2014)
Google Scholar
Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.B.: Joint learning of character and word embeddings. In: IJCAI, pp. 1236–1242 (2015)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Cotterell, R., Schütze, H., Eisner, J.: Morphological smoothing and extrapolation of word embeddings. In: Meeting of the Association for Computational Linguistics, pp. 1651–1660 (2016)
Google Scholar
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Computer Science (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Google Scholar
Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: ACL (1), pp. 455–465 (2013)
Google Scholar
Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Inside out: two jointly predictive models for word representations and phrase representations. In: AAAI, pp. 2821–2827 (2016)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics (2010)
Google Scholar
Zhao, Y., Liu, Z., Sun, M.: Phrase type sensitive tensor indexing model for semantic composition. In: AAAI, pp. 2195–2202 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Technology, Dalian University of Technology, Dalian, 116024, China
Xianchao Zhang, Shike Liu & Wenxin Liang
Shanghai University of Finance and Economics, Shanghai, 200433, China
Yuangang Li
Goldpac Limited, Zhuhai, 519070, China
Yuangang Li

Authors

Xianchao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shike Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuangang Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxin Liang .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Fengyu Cong
City University of Hong Kong, Kowloon Tong, Hong Kong
Andrew Leung
Chinese Academy of Sciences, Beijing, China
Qinglai Wei

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Liu, S., Li, Y., Liang, W. (2017). Joining External Context Characters to Improve Chinese Word Embedding. In: Cong, F., Leung, A., Wei, Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017. Lecture Notes in Computer Science(), vol 10262. Springer, Cham. https://doi.org/10.1007/978-3-319-59081-3_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-59081-3_48
Published: 31 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59080-6
Online ISBN: 978-3-319-59081-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Joining External Context Characters to Improve Chinese Word Embedding

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Chinese word embeddings from semantic and phonetic components

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Pronunciation-Enhanced Chinese Word Embedding

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Joining External Context Characters to Improve Chinese Word Embedding

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Chinese word embeddings from semantic and phonetic components

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Pronunciation-Enhanced Chinese Word Embedding

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation