[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
short-paper

POS Tag-enhanced Coarse-to-fine Attention for Neural Machine Translation

Published: 22 April 2019 Publication History

Abstract

Although neural machine translation (NMT) has certain capability to implicitly learn semantic information of sentences, we explore and show that Part-of-Speech (POS) tags can be explicitly incorporated into the attention mechanism of NMT effectively to yield further improvements. In this article, we propose an NMT model with tag-enhanced attention mechanism. In our model, NMT and POS tagging are jointly modeled via multi-task learning. Besides following common practice to enrich encoder annotations by introducing predicted source POS tags, we exploit predicted target POS tags to refine attention model in a coarse-to-fine manner. Specifically, we first implement a coarse attention operation solely on source annotations and target hidden state, where the produced context vector is applied to update target hidden state used for target POS tagging. Then, we perform a fine attention operation that extends the coarse one by further exploiting the predicted target POS tags. Finally, we facilitate word prediction by simultaneously utilizing the context vector from fine attention and the predicted target POS tags. Experimental results and further analyses on Chinese-English and Japanese-English translation tasks demonstrate the superiority of our proposed model over the conventional NMT models. We release our code at https://github.com/middlekisser/PEA-NMT.git.

References

[1]
Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the ICLR.
[2]
Maria Barrett, Joachim Bingel, Frank Keller, and Anders Søgaard. 2016. Weakly supervised part-of-speech tagging using eye-tracking data. In Proceedings of the ACL2016.
[3]
Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Simaan. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the EMNLP.
[4]
Franck Burlot, Mercedes García-Martínez, Loïc Barrault, and Fethi Bougares. 2017. Word representations in factored neural machine translation. In Proceedings of the WMT.
[5]
Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of the ACL.
[6]
Kehai Chen, Rui Wang, Masao Utiyama, Lemao Liu, Akihiro Tamura, Eiichiro Sumita, and Tiejun Zhao. 2017. Neural machine translation with source dependency representation. In Proceedings of the EMNLP.
[7]
Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, and Tiejun Zhao. 2018. Syntax-directed attention for neural machine translation. In Proceedings of the AAAI.
[8]
Yong Cheng, Shiqi Shen, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Agreement-based joint training for bidirectional attention-based neural machine translation. In Proceedings of the IJCAI.
[9]
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the EMNLP.
[10]
Greg Coppola Chris Alberti, David Weiss and Slav Petrov. 2015. Improved transition-based parsing and tagging with neural networks. In Proceedings of the EMNLP.
[11]
Trevor Cohn, Cong Duy Vu Hoang, Ekaterina Vymolova, Kaisheng Yao, Chris Dyer, and Gholamreza Haffari. 2016. Incorporating structural alignment biases into an attentional neural translation model. In Proceedings of the NAACL.
[12]
Fabien Cromieres. 2016. Kyoto-NMT: A neural machine translation implementation in chainer. In Proceedings of the COLING.
[13]
David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, Daniel Andor, Chris Alberti, and Michael Collins. 2016. Globally normalized transition-based neural networks. In Proceedings of the ACL.
[14]
V. Demberg and F. Keller. 2008. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition 109, 2 (2008), 193.
[15]
Jimmy Ba Diederik P. Kingma. 2015. Adam: A method for stochastic optimization. In Proceedings of the ICLR.
[16]
Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of the ACL.
[17]
Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of the ACL.
[18]
Akiko Eriguchi, Yoshimasa Tsuruoka, and Kyunghyun Cho. 2017. Learning to parse and translate improves neural machine translation. In Proceedings of the ACL.
[19]
Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, and Kenny Q. Zhu. 2016. Improving attention modeling with implicit distortion and fertility for machine translation. In Proceedings of the COLING.
[20]
Mercedes García-Martínez, Loïc Barrault, and Fethi Bougares. 2016. Factored neural machine translation architectures. In Proceedings of the IWSLT.
[21]
Mercedes García-Martínez, Loïc Barrault, and Fethi Bougares. 2017. Neural machine translation by generating multiple linguistic factors. In Arxiv:1712.01821v1.
[22]
Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proceedings of the EMNLP.
[23]
Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the EMNLP.
[24]
Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of the ACL.
[25]
Lemao Liu, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2016. Neural machine translation with supervised attention. In Proceedings of the COLING.
[26]
Yang Liu and Maosong Sun. 2015. Contrastive unsupervised word alignment with non-local features. In Proceedings of the AAAI.
[27]
Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2016. Multi-task sequence to sequence learning. In Proceedings of the ICLR.
[28]
Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the EMNLP.
[29]
Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Maria Nadejde, Siva Reddy, and Alexandra Birch. 2017. Predicting target language CCG supertags improves neural machine translation. In Proceedings of the WMT.
[30]
Graham Neubig, Yosuke Nakata, and Shinsuke Mori. 2011. Pointwise prediction for robust, adaptable japanese morphological analysis. In Proceedings of the ACL.
[31]
Franz Joseph Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. In Proceedings of the ACL.
[32]
Rico Sennrich and Alexandra Birch. 2016. Linguistic input features improve neural machine translation. In Proceedings of the CMT.
[33]
Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, and Biao Zhang. 2018. Variational recurrent neural machine translation. In Proceedings of the AAAI 2018.
[34]
Jinsong Su, Jiali Zeng, Deyi Xiong, Yang Liu, Mingxuan Wang, and Jun Xie. 2018. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans. Audio Speech Lang. Process. 26, 3 (2018), 623–632.
[35]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the NIPS.
[36]
Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling coverage for neural machine translation. In Proceedings of the ACL.
[37]
Mingxuan Wang, Zhengdong Lu, Jie Zhou, and Qun Liu. 2017. Deep neural machine translation with linear associative unit. In Proceedings of the ACL.
[38]
Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Learning sentence representation with guidance of human attention. In Proceedings of the IJCAI.
[39]
Shuangzhi Wu, Dongdong Zhang, Nan Yang, Mu Li, and Ming Zhou. 2017. Sequence-to-dependency neural machine translation. In Proceedings of the ACL.
[40]
Shuangzhi Wu, Ming Zhou, and Dongdong Zhang. 2017. Improved neural machine translation with source syntax. In Proceedings of the IJCAI.
[41]
Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, and Min Zhang. 2016. Variational neural machine translation. In Proceedings of the EMNLP 2016.
[42]
Jinchao Zhang, Mingxuan Wang, Qun Liu, and Jie Zhou. 2017. Incorporating word reordering knowledge into attention-based neural machine translation. In Proceedings of the ACL.

Cited By

View all
  • (2023)Part-of-Speech Tags Guide Low-Resource Machine TranslationElectronics10.3390/electronics1216340112:16(3401)Online publication date: 10-Aug-2023
  • (2022)Moment matching training for neural machine translation: An empirical studyJournal of Intelligent & Fuzzy Systems10.3233/JIFS-21324043:3(2633-2645)Online publication date: 21-Jul-2022
  • (2022)Chinese-English Contrastive Translation System Based on Lagrangian Search Mathematical Algorithm ModelApplied Mathematics and Nonlinear Sciences10.2478/amns.2022.2.0122Online publication date: 15-Jul-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 18, Issue 4
December 2019
305 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3327969
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2019
Accepted: 01 February 2019
Revised: 01 February 2019
Received: 01 June 2018
Published in TALLIP Volume 18, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Neural machine translation
  2. POS tags
  3. attention model

Qualifiers

  • Short-paper
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)3
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Part-of-Speech Tags Guide Low-Resource Machine TranslationElectronics10.3390/electronics1216340112:16(3401)Online publication date: 10-Aug-2023
  • (2022)Moment matching training for neural machine translation: An empirical studyJournal of Intelligent & Fuzzy Systems10.3233/JIFS-21324043:3(2633-2645)Online publication date: 21-Jul-2022
  • (2022)Chinese-English Contrastive Translation System Based on Lagrangian Search Mathematical Algorithm ModelApplied Mathematics and Nonlinear Sciences10.2478/amns.2022.2.0122Online publication date: 15-Jul-2022
  • (2022)LSTM-Based Attentional Embedding for English Machine TranslationScientific Programming10.1155/2022/39097262022Online publication date: 1-Jan-2022
  • (2022)Split-word Architecture in Recurrent Neural Networks POS-Tagging2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892466(01-07)Online publication date: 18-Jul-2022
  • (2022)Integration of morphological features and contextual weightage using monotonic chunk attention for part of speech taggingJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2021.08.02334:9(7324-7334)Online publication date: Oct-2022
  • (2022)Improving neural machine translation with POS-tag features for low-resource language pairsHeliyon10.1016/j.heliyon.2022.e103758:8(e10375)Online publication date: Aug-2022
  • (2022)Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic ApproachAnnals of Data Science10.1007/s40745-022-00434-411:1(347-378)Online publication date: 16-Aug-2022
  • (2021)POS-Tagging based Neural Machine Translation System for European Languages using TransformersWSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS10.37394/23209.2021.18.518(26-33)Online publication date: 24-May-2021
  • (2021)Research on Intelligent English Translation Method Based on the Improved Attention Mechanism ModelScientific Programming10.1155/2021/96672552021Online publication date: 23-Nov-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media