Abstract
Bilingual word embeddings (BWEs) have proven to be useful in many cross-lingual natural language processing tasks. Previous studies often require bilingual texts or dictionaries that are scarce resources. As a result, in these studies, the exploited explicit semantic information, such as monolingual word co-occurrences and cross-lingual semantic equivalences, is often insufficient for BWE learning, leading to the limitation of learned word representations. To overcome this problem, in this paper, we study how to exploit implicit semantic constraints for better BWEs. Concretely, we first discover implicit monolingual word-level semantic equivalences by pivoting their translations in the other language. Then, we perform BWE learning under various semantic constraints. Experimental results on machine translation and cross-lingual document classification demonstrate the effectiveness of our model.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of EMNLP2011, pp 151–161
Huang E, Socher R, Manning C, Ng A (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of ACL2012, pp 873–882
Zou WY, Rocher R, Cer D, Manning CD (2013) Bilingual word embeddings for phrase-based machine translation. In: Proceedings of EMNLP2013, pp 1393–1398
Gu X, Gu Y, Wu H (2017) Cascaded convolutional neural networks for aspect-based opinion summary. Neural Process Lett
Wu C, Shi X, Su J, Chen Y, Huang Y (2017) Co-training for implicit discourse relation recognition based on manual and distributed features. Neural Process Lett
Klementiev A, Titov I, Bhattarai B (2012) Inducing crosslingual distributed representations of words. In: Proceedings of COLING2012, pp 1459–1474
Zhou H, Chen L, Shi F, Huang D (2015) Learning bilingual sentiment word embeddings for cross-language sentiment classification. In: Proceedings of ACL2015, pp 430–440
Guo J, Che W, Yarowsky D, Wang H, Liu T (2016) A representation learning framework for multi-source transfer parsing. In: Proceedings of AAAI2016, pp 2734–2740
Mikolov T, Le QV, Sutskever I (2013) Exploiting similarities among languages for machine translation. Arxiv preprint abs/1309.4168
Hermann KM, Blunsom P (2014) Multilingual distributed representations without word alignment. In: Proceedings of ICLR2014
Chandar A P S, Lauly S, Larochelle H, Khapra MM, Ravindran B, Raykar V, , Saha A (2014) An autoencoder approach to learning bilingual word representations. In: Proceedings of NIPS2014, pp 1853–1861
Soyer H, Stenetorp P, Aizawa A (2015) Leveraging monolingual data for crosslingual compositional word representations. In: Proceedings of ICLR2015
Gouws S, Bengio Y, Corrado G (2015) Bilbowa: Fast bilingual distributed representations without word alignments. In: Proceedings of ICML2015, pp 748–756
Luong MT, Pham H, Manning CD (2015) Bilingual word representations with monolingual quality in mind. In: Proceedings of NAACL2015, pp 151–159
Vulić I, Moens MF (2015) Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of SIGIR2015, pp 363–372
Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of ICML2008, pp 160–167
Mikolov T, Karafi\(\acute{a}\)t M, Burget L, Cernock\(\acute{y}\) J, Khudanpur S (2010) Recurrent neural network based language model. In: Proceedings of INTERSPEECH2010, pp 1045–1048
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of EMNLP2014, pp 1532–1543
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. Arxiv preprint abs/1301.3781
Liu Y, Liu Z, Chua Ts, Sun M (2015) Topical word embeddings. In: Proceedings of AAAI2015, pp 2418–2424
Stratos K, Collins M, Hsu D (2015) Model-based word embeddings from decompositions of count matrices. In: Proceedings of ACL2015, pp 1282–1291
Liu P, Qiu X, Huang X (2015) Learning context-sensitive word embeddings with neural tensor skip-gram mode. In: Proceedings of IJCAI2015, pp 1284–1290
Ling W, Dyer C, Black AW, Trancoso I, Fermandez R, Amir S, Marujo L, Luis T (2015) Finding function in form: compositional character models for open vocabulary word representation. In: Proceedings of EMNLP2015, pp 1520–1530
Yin W, Schütze H (2016) Learning word meta-embeddings. In: Proceedings of ACL2016, pp 1351–1360
Qian P, Qiu X, Huang X (2016) Investigating language universal and specific properties in word embeddings. In: Proceedings of ACL2016, pp 1478–1488
Cotterell R, Schütze H, Eisner J (2016) Morphological smoothing and extrapolation of word embeddings. In: Proceedings of ACL2016, pp 1651–1660
Bhatia P, Guthrie R, Eisenstein J (2016) Morphological priors for probabilistic neural word embeddings. In: Proceedings of EMNLP2016, pp 490–500
Ji S, Yun H, Yanardag P, Matsushima S, Vishwanathan SVN (2016) Wordrank: Learning word embeddings via robust ranking. In: Proceedings of EMNLP2016, pp 658–668
Hermann KM, Blunsom P (2014) Multilingual models for compositional distributed semantics. In: Proceedings of ACL2014, pp 58–68
Kočiský T, Hermann KM, Blunsom P (2014) Learning bilingual word representations by marginalizing alignments. In: Proceedings of ACL2014, pp 224–229
Vulić I, Moens MF (2015) Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction. In: Proceedings of ACL2015, pp 719–725
Lu A, Wang W, Bansal M, Gimpel K, Livescu K (2015) Deep multilingual correlation for improved word embeddings. In: Proceedings of NAACL2015, pp 250–256
Shi T, Liu Z, Liu Y, Sun M (2015) Learning cross-lingual word embeddings via matrix co-factorization. In: Proceedings of ACL2015 Short Papers, pp 567–572
Coulmance J, Marty JM, Wenzek G, Benhalloum A (2015) Trans-gram, fast cross-lingual word-embeddings. In: Proceedings of EMNLP2015, pp 1109–1113
Oshikiri T, Fukui K, Shimodaira H (2016) Cross-lingual word representations via spectral graph embeddings. In: Proceedings of ACL2016 short paper, pp 493–498
Duong L, Kanayama H, Ma T, Bird S, Cohn T (2016) Learning crosslingual word embeddings without bilingual corpora. In: Proceedings of EMNLP2016, pp 1285–1295
Upadhyay S, Faruqui M, Dyer C, Roth D (2016) Cross-lingual models of word embeddings: An empirical comparison. In: Proceedings of ACL2016, pp 1661–1670
Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING2014, pp 497–507
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: Large-scale information network embedding. In: Proceedings of WWW2015, pp 1067–1077
Faruqui M, Dyer C (2014) Improving vector space word representations using multilingual correlation. In: Proceedings of EACL2014, pp 462–471
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS2013, pp 3111–3119
Wu H, Wang H (2007) Pivot language approach for phrase-based statistical machine translation. In: Proceedings of ACL2007, pp 165–181
Zhang J, Liu S, Li M, Zhou M, Zong C (2014) Bilingually-constrained phrase embeddings for machine translation. In: Proceedings of ACL2014, pp 111–121
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29:19–51
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of ACL2007 demo and poster sessions, pp 177–180
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of ACL2002, pp 295–302
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: A method for automatic evaluation of machine translation. In: Proceedings of ACL2002, pp 311–318
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of EMNLP2004, pp 388–395
Tian L, Wong DF, Chao LS, Quaresma P, Oliveira F, Yi L (2014) A large english-chinese parallel corpus for statistical machine translation. In: Proceedings of LREC2014, pp 1837–1842
Maaten LVD, Hinton G (2008) Visualizing high-dimensional data using t-sne. J Mach Learn Res 9:2579–2605
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Tenenbaum JB, Silva Vd, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Acknowledgements
We would like to thank all the reviewers for their constructive and helpful suggestions on this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors were supported by National Natural Science Foundation of China (Nos. 61672440 and 61573294), Scientific Research Project of National Language Committee of China (Grant No. YB135-49), Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201742).
Rights and permissions
About this article
Cite this article
Su, J., Song, Z., Lu, Y. et al. Exploring Implicit Semantic Constraints for Bilingual Word Embeddings. Neural Process Lett 48, 1073–1088 (2018). https://doi.org/10.1007/s11063-017-9762-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-017-9762-8