[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1141277.1141523acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Light stemming approaches for the French, Portuguese, German and Hungarian languages

Published: 23 April 2006 Publication History

Abstract

This paper describes and evaluates various general stemming approaches for the French, Portuguese (Brazilian), German and Hungarian languages. Based on the CLEF test-collections, we demonstrate that light stemmers for the French, Portuguese and Hungarian languages perform well, and reasonably well for the German language. Variations in mean average precision among the different stemming approaches are also evaluated and sometimes they are found statistically significant.

References

[1]
McNamee, P., and Mayfield, J. Character N-gram tokenization for European language text retrieval. IR Journal, 7(1--2), 2004, 73--97.
[2]
Lovins, J. B. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1), 1968, 22--31.
[3]
Porter, M. F. An algorithm for suffix stripping. Program, 14(3), 1980, 130--137.
[4]
Krovetz, R. Viewing morphology as an inference process. In Proceedings of the ACM-SIGIR. Pittsburgh, PA, 1993, 191--202.
[5]
Xu, J., and Croft, B. Corpus-based stemming using cooccurrence of word variants. ACM-TOIS, 16(1), 1998, 61--81.
[6]
Savoy, J. Stemming of French words based on grammatical category. JASIS, 44(1), 1993, 1--9.
[7]
Korenius, T., Laurikkala, J., Järvelin, K., and Juhola, M. Stemming and lemmatization in the clustering of Finish text documents. In Proceedings of the ACM-CIKM. Washington, DC, 2004, 625--633.
[8]
Harman, D. How effective is suffixing? JASIS, 42(1), 1991, 7--15.
[9]
Di Nunzio, G. M., Ferro, N., Melucci, M., and Orio, N. Experiments to evaluate probabilistic models for automatic stemmer generation and query word translation. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer, Berlin, 2004, 220--235.
[10]
Braschler, M., and Ripplinger, B. How effective is stemming and decompounding for German text retrieval? IR Journal, 7(3--4), 2004, 291--316.
[11]
Tomlinson, S. Lexical and algorithmic stemming compared for 9 European languages with Humminbird SearchServer#8482; at CLFF 2003. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer-Verlag, Berlin, 2004, 286--300.
[12]
Kluck, M. The GIRT data in the evaluation of CLIR systems - from 1997 until 2003. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer, Berlin, 2004, 376--390.
[13]
Savoy, J. Report on CLEF-2003 monolingual tracks. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer, Berlin, 2004, 322--336.
[14]
Buckley, C., Singhal, A., Mitra, M., and Salton, G. New retrieval approaches using SMART. In Proceedings of TREC-4. Gaithersburg, MA, 1996, 25--48.
[15]
Singhal, A., Choi, J., Hindle, D., Lewis, D. D. & Pereira, F. (1999). AT&T at TREC-7. In Proceedings TREC-7, Gaithersburg, MA, 1999, 239--251.
[16]
Robertson, S. E., Walker, S., and Beaulieu, M. Experimentation as a way of life: Okapi at TREC. IP&M, 36(1), 2000, 95--108.
[17]
Amati, G., and van Rijsbergen, C. J. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM-TOIS, 20(4), 2002, 357--389.
[18]
Savoy, J. Statistical inference in retrieval effectiveness evaluation. Information Processing & Management, 33(4), 1997, 495--512.

Cited By

View all
  • (2024)Building a relevance feedback corpus for legal information retrieval in the real-case scenario of the Brazilian Chamber of DeputiesLanguage Resources and Evaluation10.1007/s10579-024-09767-3Online publication date: 18-Aug-2024
  • (2023)Building a text retrieval system for the Sanskrit language: Exploring indexing, stemming, and searching issuesComputer Speech & Language10.1016/j.csl.2023.10151881(101518)Online publication date: Jun-2023
  • (2023)Tashaphyne0.4: a new arabic light stemmer based on rhyzome modeling approachInformation Retrieval Journal10.1007/s10791-023-09429-y26:1-2Online publication date: 14-Dec-2023
  • Show More Cited By

Index Terms

  1. Light stemming approaches for the French, Portuguese, German and Hungarian languages

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
      April 2006
      1967 pages
      ISBN:1595931082
      DOI:10.1145/1141277
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 April 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. german
      2. hungarian
      3. natural language processing
      4. portuguese
      5. stemmer
      6. stemming for french

      Qualifiers

      • Article

      Conference

      SAC06
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Upcoming Conference

      SAC '25
      The 40th ACM/SIGAPP Symposium on Applied Computing
      March 31 - April 4, 2025
      Catania , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Building a relevance feedback corpus for legal information retrieval in the real-case scenario of the Brazilian Chamber of DeputiesLanguage Resources and Evaluation10.1007/s10579-024-09767-3Online publication date: 18-Aug-2024
      • (2023)Building a text retrieval system for the Sanskrit language: Exploring indexing, stemming, and searching issuesComputer Speech & Language10.1016/j.csl.2023.10151881(101518)Online publication date: Jun-2023
      • (2023)Tashaphyne0.4: a new arabic light stemmer based on rhyzome modeling approachInformation Retrieval Journal10.1007/s10791-023-09429-y26:1-2Online publication date: 14-Dec-2023
      • (2023)Working with Text DataApplied Statistical Learning10.1007/978-3-031-33390-3_6(97-117)Online publication date: 30-Jul-2023
      • (2022)Neural Network Guided Fast and Efficient Query-Based Stemming by Predicting Term Co-occurrence StatisticsSN Computer Science10.1007/s42979-022-01081-53:3Online publication date: 24-Mar-2022
      • (2022)Ulysses-RFSQ: A Novel Method to Improve Legal Information Retrieval Based on Relevance FeedbackIntelligent Systems10.1007/978-3-031-21686-2_6(77-91)Online publication date: 28-Nov-2022
      • (2021)Deep Sentiment Analysis: A Case Study on Stemmed Turkish Twitter DataIEEE Access10.1109/ACCESS.2021.30713939(56836-56854)Online publication date: 2021
      • (2019)A novel unsupervised corpus-based stemming technique using lexicon and corpus statisticsKnowledge-Based Systems10.1016/j.knosys.2019.05.025Online publication date: May-2019
      • (2018)Text PreprocessingInformation Visualization Techniques in the Social Sciences and Humanities10.4018/978-1-5225-4990-1.ch006(86-104)Online publication date: 2018
      • (2018)Detection of Cases of Noncompliance to Drug Treatment in Patient Forum Posts: Topic Model ApproachJournal of Medical Internet Research10.2196/jmir.922220:3(e85)Online publication date: 14-Mar-2018
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media