[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1813809.1813957guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Overview of Morpho challenge 2008

Published: 17 September 2008 Publication History

Abstract

This paper gives an overview of Morpho Challenge 2008 competition and results. The goal of the challenge was to evaluate unsupervised algorithms that provide morpheme analyses for words in different languages. For morphologically complex languages, such as Finnish, Turkish and Arabic, morpheme analysis is particularly important for lexical modeling of words in speech recognition, information retrieval and machine translation. The evaluation in Morpho Challenge competitions consisted of both a linguistic and an application oriented performance analysis. In addition to the Finnish, Turkish, German and English evaluations performed in Morpho Challenge 2007, the competition this year had an additional evaluation for Arabic. The results in linguistic evaluation in 2008 show that although the level of precision and recall varies substantially between the tasks in different languages, the best methods seem to deal quite well with all languages involved. The results in information retrieval evaluation indicate that the morpheme analysis has a significant effect in all the tested languages (Finnish, English and German). The best unsupervised and language-independent morpheme analysis methods can also rival the best language-dependent word normalization methods. The Morpho Challenge was part of the EU Network of Excellence PASCAL Challenge Program and organized in collaboration with CLEF.

References

[1]
Kurimo, M., Creutz, M., Varjokallio, M., Arisoy, E., Saraclar, M.: Unsupervised segmentation of words into morphemes - Challenge 2005, an introduction and evaluation report. In: PASCAL Challenge Workshop on Unsupervised segmentation of words into morphemes, Venice, Italy (2006).
[2]
Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of HLT-NAACL, Edmonton, Canada, pp. 4-6 (2003).
[3]
Lee, Y.S.: Morphological analysis for statistical machine translation. In: Proceedings of HLT-NAACL, Boston, MA, USA (2004).
[4]
Zieman, Y., Bleich, H.: Conceptual mapping of user's queries to medical subject headings. In: Proceedings of the 1997 American Medical Informatics Association (AMIA) Annual Fall Symposium (October 1997).
[5]
Kurimo, M., Creutz, M., Varjokallio, M.: Morpho Challenge evaluation using a linguistic Gold Standard. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 864-872. Springer, Heidelberg (2008).
[6]
Kurimo, M., Creutz, M., Turunen, V.: Morpho Challenge evaluation by IR experiments. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 991-998. Springer, Heidelberg (2009).
[7]
Cetinoglu, O.: Prolog based natural language processing infrastructure for Turkish. M.Sc. thesis, Bogazici University, Istanbul, Turkey (2000).
[8]
Dutagaci, H.: Statistical language models for large vocabulary continuous speech recognition of Turkish. M.Sc. thesis, Bogazici University, Istanbul, Turkey (2002).
[9]
Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR 2005), Espoo, Finland, pp. 106-113 (2005).
[10]
Creutz, M., Lagus, K.: Morfessor in the Morpho Challenge. In: PASCAL Challenge Workshop on Unsupervised segmentation of words into morphemes, Venice, Italy (2006).
[11]
Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of theWorkshop on Morphological and Phonological Learning of ACL 2002, pp. 21-30 (2002).
[12]
Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor. Technical Report A81, Publications in Computer and Information Science, Helsinki University of Technology (2005), http://www.cis.hut.fi/projects/morpho/
[13]
Creutz, M., Linden, K.: Morpheme segmentation gold standards for finnish and english. Technical Report A77, Publications in Computer and Information Science, Helsinki University of Technology (2004), http://www.cis.hut.fi/projects/morpho/
[14]
Habash, N.: Large scale lexeme based arabic morphological generation. In: Proceedings of Traitement Automatique du Langage Naturel (TALN 2004), Fez, Morocco (2004).
[15]
Habash, N., Sadat, F.: Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the Human Language Technology, Conference of the North American Chapter of the Association for Computational Linguistics (HLTNAACL), New York, USA (2006).
[16]
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130-137 (1980).
[17]
Virpioja, S., Väyrynen, J.J., Creutz, M., Sadeniemi, M.: Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. In: Proceedings of Machine Translation Summit XI, Copenhagen, Denmark (2007).
[18]
Virpioja, S.: Private communication (2008).
[19]
Monson, C., Carbonell, J., Lavie, A., Levin, L.: ParaMor and Morpho Challenge 2008. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 967-974. Springer, Heidelberg (2009).

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
CLEF'08: Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
September 2008
1002 pages
ISBN:3642044468

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 September 2008

Author Tags

  1. machine learning
  2. morphological analysis

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Text StemmingACM Computing Surveys10.1145/297560849:3(1-46)Online publication date: 16-Sep-2016
  • (2014)CLEF 15th BirthdayACM SIGIR Forum10.1145/2701583.270158748:2(31-55)Online publication date: 23-Dec-2014
  • (2010)EMMAProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873897(1029-1037)Online publication date: 23-Aug-2010
  • (2010)Morpho Challenge competition 2005--2010Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology10.5555/1870478.1870489(87-95)Online publication date: 15-Jul-2010
  • (2009)Addressing morphological variation in alphabetic languagesProceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval10.1145/1571941.1571957(75-82)Online publication date: 19-Jul-2009
  • (2008)Using unsupervised paradigm acquisition for prefixesProceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access10.5555/1813809.1813960(983-990)Online publication date: 17-Sep-2008
  • (2008)AllomorfessorProceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access10.5555/1813809.1813959(975-982)Online publication date: 17-Sep-2008
  • (2008)A probabilistic model for guessing base forms of new words by analogyProceedings of the 9th international conference on Computational linguistics and intelligent text processing10.5555/1787578.1787591(106-116)Online publication date: 17-Feb-2008
  • (2008)Don't have a stemmer?Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1390334.1390518(813-814)Online publication date: 20-Jul-2008

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media