Abstract
Although neural machine translation (NMT) yields promising translation performance, it unfortunately suffers from over- and under-translation issues [31], of which studies have become research hotspots in NMT. At present, these studies mainly apply the dominant automatic evaluation metrics, such as BLEU, to evaluate the overall translation quality with respect to both adequacy and fluency. However, they are unable to accurately measure the ability of NMT systems in dealing with the above-mentioned issues. In this paper, we propose two quantitative metrics, the Otem and Utem, to automatically evaluate the system performance in terms of over- and under-translation respectively. Both metrics are based on the proportion of mismatched n-grams between gold reference and system translation. We evaluate both metrics by comparing their scores with human evaluations, where the values of Pearson Correlation Coefficient reveal their strong correlation. Moreover, in-depth analyses on various translation systems indicate some inconsistency between BLEU and our proposed metrics, highlighting the necessity and significance of our metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: ACL Workshop (2005)
Chen, B., Guo, H.: Representation based translation evaluation metrics. In: ACL (2015)
Chen, B., Kuhn, R.: AMBER: a modified BLEU, enhanced ranking metric. In: WMT (2011)
Chen, B., Kuhn, R., Foster, G.: Improving AMBER, an MT evaluation metric. In: WMT (2012)
Chiang, D.: Hierarchical phrase-based translation. Comput. Linguist. 33(2), 201–228 (2007)
Cohn, T., Hoang, C.D.V., Vymolova, E., Yao, K., Dyer, C., Haffari, G.: Incorporating structural alignment biases into an attentional neural translation model. In: NAACL (2016)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: HLT (2002)
Feng, S., Liu, S., Yang, N., Li, M., Zhou, M., Zhu, K.Q.: Improving attention modeling with implicit distortion and fertility for machine translation. In: COLING (2016)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML (2017)
Giménez, J., Màrquez, L.: Linguistic features for automatic evaluation of heterogenous MT systems. In: WMT (2007)
Gupta, R., Orasan, C., van Genabith, J.: ReVal: a simple and effective machine translation evaluation metric based on recurrent neural networks. In: EMNLP (2015)
Guzmán, F., Joty, S., Màrquez, L., Nakov, P.: Pairwise neural machine translation evaluation. In: ACL (2015)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL (2003)
Leusch, G., Ueffing, N., Ney, H.: A novel string to string distance measure with applications to machine translation evaluation. In: Mt Summit IX (2003)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: ACL Workshop (2004)
Liu, D., Gildea, D.: Syntactic features for evaluation of machine translation. In: ACL Workshop (2005)
Lo, C.k., Tumuluru, A.K., Wu, D.: Fully automatic semantic MT evaluation. In: WMT (2012)
Lo, C.k., Wu, D.: MEANT: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles. In: ACL (2011)
Lo, C.k., Wu, D.: MEANT at WMT 2013: a tunable, accurate yet inexpensive semantic frame based MT evaluation metric. In: WMT (2013)
Malaviya, C., Ferreira, P., Martins, A.F.: Sparse and constrained attention for neural machine translation. arXiv preprint arXiv:1805.08241 (2018)
Mehay, D.N., Brew, C.: BLEUÂTRE: flattening syntactic dependencies for MT evaluation. In: MT Summit (2007)
Nießen, S., Och, F.J., Leusch, G., Ney, H.: An evaluation tool for machine translation: fast evaluation for MT research. In: LREC (2000)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)
Popović, M., Ney, H.: Towards automatic error analysis of machine translation output. Comput. Linguist. 37(4), 657–688 (2011)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL (2016)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: AMTA (2006)
Sundermeyer, M., Alkhouli, T., Wuebker, J., Ney, H.: Translation modeling with bidirectional recurrent neural networks. In: EMNLP (2014)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014)
Owczarzak, K., van Genabith, J., Way, A.: Dependency-based automatic evaluation for machine translation. In: SSST (2007)
Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: ACL (2016)
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Yang, Z., Hu, Z., Deng, Y., Dyer, C., Smola, A.: Neural machine translation with recurrent attention modeling. In: EACL (2017)
Yu, H., Wu, X., Xie, J., Jiang, W., Liu, Q., Lin, S.: RED: a reference dependency based MT evaluation metric. In: COLING (2014)
Zhang, B., Xiong, D., Su, J.: A GRU-gated attention model for neural machine translation. arXiv preprint arXiv:1704.08430 (2017)
Acknowledgement
The authors were supported by Natural Science Foundation of China (No. 61672440), the Fundamental Research Funds for the Central Universities (Grant No. ZK1024), Scientific Research Project of National Language Committee of China (Grant No. YB135-49), and Research Fund of the Provincial Key Laboratory for Computer Information Processing Technology in Soochow University (Grant No. KJS1520). We also thank the reviewers for their insightful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, J., Zhang, B., Qin, Y., Zhang, X., Lin, Q., Su, J. (2018). Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2018. Lecture Notes in Computer Science(), vol 11108. Springer, Cham. https://doi.org/10.1007/978-3-319-99495-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-99495-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99494-9
Online ISBN: 978-3-319-99495-6
eBook Packages: Computer ScienceComputer Science (R0)