[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1690219.1690258dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free access

Case markers and morphology: addressing the crux of the fluency problem in English-Hindi SMT

Published: 02 August 2009 Publication History

Abstract

We report in this paper our work on accurately generating case markers and suffixes in English-to-Hindi SMT. Hindi is a relatively free word-order language, and makes use of a comparatively richer set of case markers and morphological suffixes for correct meaning representation. From our experience of large-scale English-Hindi MT, we are convinced that fluency and fidelity in the Hindi output get an order of magnitude facelift if accurate case markers and suffixes are produced. Now, the moot question is: what entity on the English side encodes the information contained in case markers and suffixes on the Hindi side? Our studies of correspondences in the two languages show that case markers and suffixes in Hindi are predominantly determined by the combination of suffixes and semantic relations on the English side. We, therefore, augment the aligned corpus of the two languages, with the correspondence of English suffixes and semantic relations with Hindi suffixes and case markers. Our results on 400 test sentences, translated using an SMT system trained on around 13000 parallel sentences, show that suffix + semantic relationcase marker/suffix is a very useful translation factor, in the sense of making a significant difference to output quality as indicated by subjective evaluation as well as BLEU scores.

References

[1]
Ananthakrishnan, R., and Rao, D., A Lightweight Stemmer for Hindi, Workshop on Computational Linguistics for South-Asian Languages, EACL, 2003.
[2]
Ananthakrishnan, R., Bhattacharyya, P., Hegde, J. J., Shah, R. M., and Sasikumar, M., Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation, Proceedings of IJCNLP, 2008.
[3]
Avramidis, E., and Koehn, P., Enriching Morphologically Poor Languages for Statistical Machine Translation, Proceedings of ACL-08: HLT, 2008.
[4]
Collins, M., Koehn, P., and I. Kucerova, Clause Restructuring for Statistical Machine Translation, Proceedings of ACL, 2005.
[5]
Imamura, K., Okuma, H., Sumita, E., Practical Approach to Syntax-based Statistical Machine Translation, Proceedings of MTSUMMIT X, 2005.
[6]
Koehn, P., and Hoang, H., Factored Translation Models, Proceedings of EMNLP, 2007.
[7]
Marie-Catherine de Marneffe, MacCartney, B., and Manning, C., Generating Typed Dependency Parses from Phrase Structure Parses, Proceedings of LREC, 2006.
[8]
Marie-Catherine de Marneffe and Manning, C., Stanford Typed Dependency Manual, 2008.
[9]
Melamed, D., Statistical Machine Translation by Parsing, Proceedings of ACL, 2004.
[10]
Minnen, G., Carroll, J., and Pearce, D., Applied Morphological Processing of English, Natural Language Engineering, 7(3), pages 207--223, 2001.
[11]
Nießen, S., and Ney, H., Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information, Computational Linguistics, 30(2), pages 181--204, 2004.
[12]
Och, F., Minimum Error Rate Training in Statistical Machine Translation, Proceedings of ACL, 2003.
[13]
Papineni, K., Roukos, S., Ward, T., and Zhu, W., BLEU: a Method for Automatic Evaluation of Machine Translation, IBM Research Report, Thomas J. Watson Research Center, 2001.
[14]
Popovic, M., and Ney, H., Statistical Machine Translation with a Small Amount of Bilingual Training Data, 5th LREC SALTMIL Workshop on Minority Languages, 2006.
[15]
Wang, C., Collins, M., and Koehn, P., Chinese Syntactic Reordering for Statistical Machine Translation, Proceedings of the EMNLP-CoNLL, 2007.

Cited By

View all
  • (2017)Role of Morphology Injection in SMTACM Transactions on Asian and Low-Resource Language Information Processing10.1145/312920817:1(1-31)Online publication date: 15-Sep-2017
  • (2014)Word Prediction System for Text Entry in HindiACM Transactions on Asian Language Information Processing10.1145/261759013:2(1-29)Online publication date: 1-Jun-2014
  • (2013)No free lunch in factored phrase-based machine translationProceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 210.1007/978-3-642-37256-8_18(210-223)Online publication date: 24-Mar-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
August 2009
595 pages
ISBN:9781932432466
  • General Chair:
  • Keh-Yih Su

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 02 August 2009

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)4
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Role of Morphology Injection in SMTACM Transactions on Asian and Low-Resource Language Information Processing10.1145/312920817:1(1-31)Online publication date: 15-Sep-2017
  • (2014)Word Prediction System for Text Entry in HindiACM Transactions on Asian Language Information Processing10.1145/261759013:2(1-29)Online publication date: 1-Jun-2014
  • (2013)No free lunch in factored phrase-based machine translationProceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 210.1007/978-3-642-37256-8_18(210-223)Online publication date: 24-Mar-2013
  • (2012)Probes in a taxonomy of factored phrase-based modelsProceedings of the Seventh Workshop on Statistical Machine Translation10.5555/2393015.2393050(253-260)Online publication date: 7-Jun-2012
  • (2011)A word reordering model for improved machine translationProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145489(486-496)Online publication date: 27-Jul-2011
  • (2011)An exponential translation model for target language morphologyProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002502(230-238)Online publication date: 19-Jun-2011
  • (2011)Combining morpheme-based machine translation with post-processing morpheme predictionProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002477(32-42)Online publication date: 19-Jun-2011
  • (2010)Using TectoMT as a preprocessing tool for phrase-based statistical machine translationProceedings of the 13th international conference on Text, speech and dialogue10.5555/1887176.1887206(216-223)Online publication date: 6-Sep-2010

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media