Abstract
Statutory sentences are generally difficult to read because of their complicated expressions and length. Such difficulty is one reason for the low quality of statistical machine translation (SMT). Multi-word expressions (MWEs) also complicate statutory sentences and extend their length. Therefore, we proposed a method that utilizes MWEs to improve the SMT system of statutory sentences. In our method, we extracted the monolingual MWEs from a parallel corpus, automatically acquired these translations based on the Dice coefficient, and integrated the extracted bilingual MWEs into an SMT system by the single-tokenization strategy. The experiment results with our SMT system using the proposed method significantly improved the translation quality. Although automatic translation equivalent acquisition using the Dice coefficient is not perfect, the best system’s score was close to a system that used bilingual MWEs whose equivalents are translated by hand.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Caseli, H.M., Villavicencio, A., Machado, A., Finatto, M.J.: Statistically-driven alignment-based multiword expression identification for technical domains. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 1–8 (2009)
Van de Cruys, T., Moirón, B.V.: Semantics-based multiword expression extraction. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp. 25–32 (2007)
EDP, ALC Press Inc.: Eijiro, 8 edn. (2014)
Finlayson, M.A., Kulkarni, N.: Detecting multi-word expressions improves word sense disambiguation. In: Proceedings of the Workshop on Multiword Expressions: From Parsing and Generation to the Real World, pp. 20–24 (2011)
Isozaki, H., Sudoh, K., Tsukada, H., Duh, K.: Head finalization: a simple reordering rule for SOV languages. In: Proceedings of the Joint 5th Workshop on Statistical Machine Translation and Metrics MATR, pp. 244–251 (2010)
Bui, T.H., Nguyen, L.M., Shimazu, A.: Translating legal sentence by segmentation and rule selection. Int. J. Nat. Lang. Comput. 2(4), 35–54 (2013)
Toyama, K., Saito, D., Sekine, Y., Ogawa, Y., Kakuta, T., Kimura, T., Matsuura, Y.: Design and development of Japanese law translation memory database system. In: Law via the Internet 2011, 12 p. (2011)
Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 12–19 (2006)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007)
Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying conditional random fields to Japanese morphological analysis. In: Proceedings of the 2004 Conference on Empirical Methods on Natural Language Processing, pp. 230–237 (2004)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Pal, S., Naskar, S.K., Bandyopadhyay, S.: MWE alignment in phrase based statistical machine translation. In: Proceedings of the XIV Machine Translation Summit, pp. 61–68 (2013)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Piao, S.S., Rayson, P., Archer, D., McEnery, T.: Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Comput. Speech Lang. 19(4), 378–397 (2005)
Ramisch, C.: Multiword Expressions Acquisition: A Generic and Open Framework. Springer, Cham (2014)
Ren, Z., Lü, Y., Cao, J., Liu, Q., Huang, Y.: Improving statistical machine translation using domain bilingual multiword expressions. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 47–54 (2009)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). doi:10.1007/3-540-45715-1_1
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the 7th International Conference on Spoken Language Processing, vol. 2, pp. 901–904 (2002)
Tsvetkov, Y., Wintner, S.: Extraction of multi-word expressions from small parallel corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1256–1264 (2010)
Zarrieß, S., Kuhn, J.: Exploiting translational correspondences for pattern-independent MWE identification. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 23–30 (2009)
Acknowledgements
This research was partly supported by the Japan Society for the Promotion of Science KAKENHI Grant-in-Aid for Scientific Research (S) No. 23220005, (A) No. 26240050 and (C) No. 15K00201.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sakamoto, S., Ogawa, Y., Nakamura, M., Ohno, T., Toyama, K. (2017). Utilization of Multi-word Expressions to Improve Statistical Machine Translation of Statutory Sentences. In: Otake, M., Kurahashi, S., Ota, Y., Satoh, K., Bekki, D. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2015. Lecture Notes in Computer Science(), vol 10091. Springer, Cham. https://doi.org/10.1007/978-3-319-50953-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-50953-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50952-5
Online ISBN: 978-3-319-50953-2
eBook Packages: Computer ScienceComputer Science (R0)