[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts

  • Published:
Machine Translation

Abstract

Recently, machine translation systems based on neural networks have reached state-of-the-art results for some pairs of languages (e.g., German–English). In this paper, we are investigating the performance of neural machine translation in Chinese–Spanish, which is a challenging language pair. Given that the meaning of a Chinese word can be related to its graphical representation, this work aims to enhance neural machine translation by using as input a combination of: words or characters and their corresponding bitmap fonts. The fact of performing the interpretation of every word or character as a bitmap font generates more informed vectorial representations. Best results are obtained when using words plus their bitmap fonts obtaining an improvement (over a competitive neural MT baseline system) of almost six BLEU, five METEOR points and ranked coherently better in the human evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers.

  2. http://iwslt2010.fbk.eu.

  3. https://translate.google.com/.

  4. https://www.bing.com/translator.

  5. http://www.chispa.me.

  6. http://www.taus.net.

  7. http://github.com/nyu-dl/dl4mt-tutorial/.

  8. https://research.googleblog.com/2016/09/a-neural-network-for-machine.html.

References

  • Aldón D (2016) Sistema de Traducción Neuronal Usando Bitmaps. B.s. thesis, Universitat Politècnica de Catalunya

  • Aldón D, Costa-jussà MR, Fonollosa JAR (2016) Neural machine translation using bitmap fonts. In: EAMT Workshop on Hybrid Approaches to Translation (HyTRA)

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. CoRR, arXiv:1409.0473

  • Banchs R, Crego JM, Lambert P, Mariño JB (2006) A feasibility study for chinese-spanish statistical machine translation. In: Procedings of the 5th International Symposium on Chinese Spoken Language Processing (ISCSLP)

  • Castaño MA, Casacuberta F (1997) A connectionist approach to mt. In: Proceedings of the EUROSPEECH Conference

  • Centelles J, Costa-jussà MR, Banchs RE (2014) Chispa on the go: a mobile chinese-spanish translation service for travellers in trouble. In: Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–36

  • Chew PA, Verzi SJ, Bauer TL, McClain, JT (2006) Evaluation Of The bible as a resource for cross-language information retrieval. In: Proceedings of the Workshop on Multilingual Language Resources and Interoperability, pp. 68–74

  • Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2):201–228

    Article  MathSciNet  MATH  Google Scholar 

  • Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014a) On the properties of neural machine translation: Encoder–decoder approaches. In CoRR

  • Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014b) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1724–1734

  • Costa-jussà MR, Centelles J (2016) Description of the chinese-to-spanish rule-based machine translation system developed using a hybrid combination of human annotation and statistical techniques. ACM Trans Asian Low Res Lang Inf Process. doi:10.1145/2738045

  • Costa-jussà MR, Fonollosa JAR (2016) Character-based neural machine translation. In: Proceedings of the ACL

  • Costa-jussà MR, Henrìquez CA, Banchs RE (2012) Evaluating indirect strategies for chinese-spanish statistical machine translation. J Artif Intell Res 45:762–780

    MATH  Google Scholar 

  • Dyer C (2016) http://code.google.com/p/zhseg/

  • Dyer C, Chahuneau V, Smith NA (2013) A simple, fast, and effective reparameterization of ibm model 2. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia. Association for Computational Linguistics, pp. 644–648

  • Elliott D, Frank S, Hasler E (2015) Multi-language image description with neural sequence models. CoRR, arXiv:1510.04709

  • Firat O, Cho K, Sankaran B, Vural FT, Bengio Y (2017) Multi-Way, multilingual neural machine translation. Accepted for publication in Computer Speech and Language, Special Issue in Deep learning for Machine Translation

  • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382

    Article  Google Scholar 

  • Forcada ML, Ñeco RP (1997) Recursive hetero-associative memories for translation. In: Proceedings of the International Work-Conference on Artificial and Natural Neural Networks: Biological and Artificial Computation: From Neuroscience to Technology, IWANN ’97, Springer, London, UK, pp. 453–462

  • Hitschler J, Schamoni S, Riezler S (2016) Multimodal pivots for image caption translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 2399–2409

  • Jean S, Cho K, Memisevic R, Bengio Y (2015) On using very large target vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol 1: Long Papers), Beijing, pp. 1–10

  • Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, pp. 1700–1709

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL ’07, Stroudsburg, pp. 177–180

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the ACL

  • Lavie A, Denkowski MJ (2009) The meteor metric for automatic evaluation of machine translation. Mach Transl 23(2–3):105–115

    Article  Google Scholar 

  • Liu M, Rus V, Liao Q, Liu L (2016) Encoding and ranking similar Chinese characters. Tech report, Accessed online 11/2016, Chongqing University

  • Luong M-T, Manning CD (2016) Achieving open vocabulary neural machine translation with hybrid word-character models. In: Association for Computational Linguistics (ACL), Berlin

  • Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon. Association for Computational Linguistics, pp. 1412–1421

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pp. 311–318

  • Paul M (2008) Overview of the iwslt 2008 evaluation campaign. In: Proceedings of the International Workshop on Spoken Language Translation, Hawaii, pp. 1–17

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, pp. 3104–3112

  • Takezawa T (2006) Multilingual spoken language corpus development for communication research. In: Chinese Spoken Language Processing, 5th International Symposium, ISCSLP 2006, Singapore, December 13-16, 2006, Proceedings, pp. 781–791

  • Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016a) The united nations parallel corpus v1.0. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

  • Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016b) The united nations parallel corpus v1.0. In Chair NCC, Choukri K, Declerck T, Goggi S, Grobelnik M, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S (eds), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA) Paris

Download references

Acknowledgements

This work is supported by the Spanish Ministerio de Economía y Competitividad and European Regional Development Fund, through the postdoctoral senior grant Ramón y Cajal and the contract TEC2015-69266-P (MINECO/FEDER, UE).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marta R. Costa-jussà.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Costa-jussà, M.R., Aldón, D. & Fonollosa, J.A.R. Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts. Machine Translation 31, 35–47 (2017). https://doi.org/10.1007/s10590-017-9196-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-017-9196-0

Keywords

Navigation