Abstract
Recently, machine translation systems based on neural networks have reached state-of-the-art results for some pairs of languages (e.g., German–English). In this paper, we are investigating the performance of neural machine translation in Chinese–Spanish, which is a challenging language pair. Given that the meaning of a Chinese word can be related to its graphical representation, this work aims to enhance neural machine translation by using as input a combination of: words or characters and their corresponding bitmap fonts. The fact of performing the interpretation of every word or character as a bitmap font generates more informed vectorial representations. Best results are obtained when using words plus their bitmap fonts obtaining an improvement (over a competitive neural MT baseline system) of almost six BLEU, five METEOR points and ranked coherently better in the human evaluation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Aldón D (2016) Sistema de Traducción Neuronal Usando Bitmaps. B.s. thesis, Universitat Politècnica de Catalunya
Aldón D, Costa-jussà MR, Fonollosa JAR (2016) Neural machine translation using bitmap fonts. In: EAMT Workshop on Hybrid Approaches to Translation (HyTRA)
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. CoRR, arXiv:1409.0473
Banchs R, Crego JM, Lambert P, Mariño JB (2006) A feasibility study for chinese-spanish statistical machine translation. In: Procedings of the 5th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Castaño MA, Casacuberta F (1997) A connectionist approach to mt. In: Proceedings of the EUROSPEECH Conference
Centelles J, Costa-jussà MR, Banchs RE (2014) Chispa on the go: a mobile chinese-spanish translation service for travellers in trouble. In: Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–36
Chew PA, Verzi SJ, Bauer TL, McClain, JT (2006) Evaluation Of The bible as a resource for cross-language information retrieval. In: Proceedings of the Workshop on Multilingual Language Resources and Interoperability, pp. 68–74
Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2):201–228
Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014a) On the properties of neural machine translation: Encoder–decoder approaches. In CoRR
Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014b) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1724–1734
Costa-jussà MR, Centelles J (2016) Description of the chinese-to-spanish rule-based machine translation system developed using a hybrid combination of human annotation and statistical techniques. ACM Trans Asian Low Res Lang Inf Process. doi:10.1145/2738045
Costa-jussà MR, Fonollosa JAR (2016) Character-based neural machine translation. In: Proceedings of the ACL
Costa-jussà MR, Henrìquez CA, Banchs RE (2012) Evaluating indirect strategies for chinese-spanish statistical machine translation. J Artif Intell Res 45:762–780
Dyer C (2016) http://code.google.com/p/zhseg/
Dyer C, Chahuneau V, Smith NA (2013) A simple, fast, and effective reparameterization of ibm model 2. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia. Association for Computational Linguistics, pp. 644–648
Elliott D, Frank S, Hasler E (2015) Multi-language image description with neural sequence models. CoRR, arXiv:1510.04709
Firat O, Cho K, Sankaran B, Vural FT, Bengio Y (2017) Multi-Way, multilingual neural machine translation. Accepted for publication in Computer Speech and Language, Special Issue in Deep learning for Machine Translation
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382
Forcada ML, Ñeco RP (1997) Recursive hetero-associative memories for translation. In: Proceedings of the International Work-Conference on Artificial and Natural Neural Networks: Biological and Artificial Computation: From Neuroscience to Technology, IWANN ’97, Springer, London, UK, pp. 453–462
Hitschler J, Schamoni S, Riezler S (2016) Multimodal pivots for image caption translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 2399–2409
Jean S, Cho K, Memisevic R, Bengio Y (2015) On using very large target vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol 1: Long Papers), Beijing, pp. 1–10
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, pp. 1700–1709
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL ’07, Stroudsburg, pp. 177–180
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the ACL
Lavie A, Denkowski MJ (2009) The meteor metric for automatic evaluation of machine translation. Mach Transl 23(2–3):105–115
Liu M, Rus V, Liao Q, Liu L (2016) Encoding and ranking similar Chinese characters. Tech report, Accessed online 11/2016, Chongqing University
Luong M-T, Manning CD (2016) Achieving open vocabulary neural machine translation with hybrid word-character models. In: Association for Computational Linguistics (ACL), Berlin
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon. Association for Computational Linguistics, pp. 1412–1421
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pp. 311–318
Paul M (2008) Overview of the iwslt 2008 evaluation campaign. In: Proceedings of the International Workshop on Spoken Language Translation, Hawaii, pp. 1–17
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, pp. 3104–3112
Takezawa T (2006) Multilingual spoken language corpus development for communication research. In: Chinese Spoken Language Processing, 5th International Symposium, ISCSLP 2006, Singapore, December 13-16, 2006, Proceedings, pp. 781–791
Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016a) The united nations parallel corpus v1.0. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016b) The united nations parallel corpus v1.0. In Chair NCC, Choukri K, Declerck T, Goggi S, Grobelnik M, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S (eds), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA) Paris
Acknowledgements
This work is supported by the Spanish Ministerio de Economía y Competitividad and European Regional Development Fund, through the postdoctoral senior grant Ramón y Cajal and the contract TEC2015-69266-P (MINECO/FEDER, UE).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Costa-jussà, M.R., Aldón, D. & Fonollosa, J.A.R. Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts. Machine Translation 31, 35–47 (2017). https://doi.org/10.1007/s10590-017-9196-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-017-9196-0