[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
note
Open access

Chinese-Catalan: A Neural Machine Translation Approach Based on Pivoting and Attention Mechanisms

Published: 22 April 2019 Publication History

Abstract

This article innovatively addresses machine translation from Chinese to Catalan using neural pivot strategies trained without any direct parallel data. The Catalan language is very similar to Spanish from a linguistic point of view, which motivates the use of Spanish as pivot language. Regarding neural architecture, we are using the latest state-of-the-art, which is the Transformer model, only based on attention mechanisms. Additionally, this work provides new resources to the community, which consists of a human-developed gold standard of 4,000 sentences between Catalan and Chinese and all the others United Nations official languages (Arabic, English, French, Russian, and Spanish). Results show that the standard pseudo-corpus or synthetic pivot approach performs better than cascade.

References

[1]
Maite Ardevol. 2006. Informe Anual OME 2006: Tendencies de futur i noves realitats. http://coneixement.accio.gencat.cat/c/document_library/get_file?uuid=a1d92ec4-ac8d-40e1-872e-f87d81a6bed7&groupId===30582.
[2]
Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2017. Unsupervised machine translation using monolingual corpora. CoRR abs/1711.00041.
[3]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Arxiv Preprint Arxiv:1409.0473.
[4]
Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo, and Marcello Federico. 2016. Neural versus phrase-based machine translation quality: A case study. Arxiv Preprint Arxiv:1608.04631.
[5]
Ibana Casaburi. 2016. Chinese International Investment. http://itemsweb.esade.edu/research/esadegeo/ChineseInvestmentTrendsInEurope.pdf.
[6]
Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 3974--3980.
[7]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Arxiv Preprint Arxiv:1406.1078.
[8]
Marta R. Costa-jussà. 2017. Why Catalan-Spanish neural machine translation? analysis, comparison and combination with standard rule and phrase-based technologies. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’17). Association for Computational Linguistics, Valencia, Spain, 55--62.
[9]
Marta R. Costa-jussà, David Aldón, and José A. Fonollosa. 2017. Chinese-Spanish neural machine translation enhanced with character and word bitmap fonts. Mach. Trans. 31, 1--2 (June 2017), 35--47.
[10]
Marta R. Costa-jussà and Jordi Centelles. 2015. Description of the Chinese-to-Spanish rule-based machine translation system developed using a hybrid combination of human annotation and statistical techniques. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15, 1, Article 1 (Nov. 2015).
[11]
Marta R. Costa-jussà and C. Escolano. 2016. Morphology generation for statistical machine translation using deep learning techniques. CORR, Arxiv:1610.02209.
[12]
Marta R. Costa-jussà, José A. R. Fonollosa, José B. Mariño, Marc Poch, and Mireia Farrús. 2014. A large Spanish-Catalan parallel corpus release for machine translation. Comput. Info. 33, 4 (2014), 907--920.
[13]
Marta R. Costa-jussà, Carlos A. Henríquez Q, and Rafael E. Banchs. 2012. Evaluating indirect strategies for Chinese-Spanish statistical machine translation. J. Artif. Int. Res. 45, 1 (Sept. 2012), 761--780.
[14]
Marta R. Costa-jussá, Noé Casas, and Maite Melero. 2018. English-Catalan neural machine translation in the biomedical domain through the cascade approach. In Proceedings of the 11th Language Resources and Evaluation Conference of the European Language Resources Association.
[15]
John DeFrancis. 1984. The Chinese language: Fact and fantasy. http://www.la.utexas.edu/dsena/courses/globexchina/readings/defrancis.pdf.
[16]
Mireia Farrús, Marta R. Costa-jussà, José B. Mariño, Marc Poch, Adolfo Hernández, Carlos Henríquez, and José A. Fonollosa. 2011. Overcoming statistical machine translation limitations: Error analysis and proposed solutions for the Catalan-Spanish language pair. Lang. Resour. Eval. 45, 2 (May 2011), 181--208.
[17]
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. Arxiv Preprint Arxiv:1705.03122.
[18]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780.
[19]
Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda B. Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s multilingual neural machine translation system: Enabling zero-shot translation. CoRR abs/1611.04558.
[20]
Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation. Association for Computational Linguistics, 28--39.
[21]
Guillaume Lample, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. CoRR abs/1711.00043.
[22]
Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. Arxiv Preprint Arxiv:1508.04025 (2015).
[23]
Lluís Padró and Evgeny Stanilovsky. 2012. FreeLing 3.0: Towards wider multilinguality. In Proceedings of the International Conference on Language Resources and Evaluation.
[24]
Gema Ramirez Sanchez, Felipe Sanchez-Martinez, Sergio Ortiz Rojas, Juan Antonio Perez-Ortiz, and Mikel L. Forcada. 2006. Opentrad Apertium open-source machine translation system: An opportunity for business and research. http://www.mt-archive.info/Aslib-2006-Ramirez-Sanchez.pdf.
[25]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1715--1725.
[26]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. MIT Press, 3104--3112.
[27]
Antonio Toral and Víctor M. Sánchez-Cartagena. 2017. A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. Arxiv Preprint Arxiv:1701.02901 (2017).
[28]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6000--6010.
[29]
Michal Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The United Nations parallel corpus v1.0. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), Paris, France. 23--28.

Cited By

View all
  • (2020)Efficient Low-Resource Neural Machine Translation with Reread and Feedback MechanismACM Transactions on Asian and Low-Resource Language Information Processing10.1145/336524419:3(1-13)Online publication date: 9-Jan-2020

Index Terms

  1. Chinese-Catalan: A Neural Machine Translation Approach Based on Pivoting and Attention Mechanisms

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 18, Issue 4
    December 2019
    305 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3327969
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 April 2019
    Accepted: 01 February 2019
    Revised: 01 December 2018
    Received: 01 June 2018
    Published in TALLIP Volume 18, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Chinese-Catalan
    2. Neural machine translation
    3. pivot approaches
    4. transformer

    Qualifiers

    • Note
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)112
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Efficient Low-Resource Neural Machine Translation with Reread and Feedback MechanismACM Transactions on Asian and Low-Resource Language Information Processing10.1145/336524419:3(1-13)Online publication date: 9-Jan-2020

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media