Abstract
Statistical stemmers are important components of Information Retrieval (IR) systems, especially for text search over languages with few linguistic resources. In recent years, research on stemmers produced relevant results, especially in 2011 when three language-independent stemmers were published in relevant venues. In this paper, we describe our efforts for reproducing these three stemmers. We also share the code as open-source and an extended version of Terrier system integrating the developed stemmers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Di Nunzio, G.M., Ferro, N., Mandl, T., Peters, C.: CLEF 2007: ad hoc track overview. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 13–32. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85760-0_2
Dietz, F., Petras, V.: A component-level analysis of an academic search test collection. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 29–42. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_3
Dolamic, L., Savoy, J.: Indexing and stemming approaches for the Czech language author links open overlay panel. Inf. Proces. Manage. 45(6), 714–720 (2009)
Ferro, N., Silvello, G.: CLEF 15th birthday: what can we learn from ad hoc retrieval? In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 31–43. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11382-1_4
Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993), pp. 191–202. ACM Press (1993)
Lovins, J.B.: Development of a Stemming algorithm. Mech. Transl. Comput. Linguist. 11(1/2), 22–31 (1968)
Macdonald, C., McCreadie, R., Santos, R.L.T., Ounis, I.: From puppy to maturity: experiences in developing terrier. In: Proceedings of OSIR at SIGIR, pp. 60–63 (2012)
Paik, J.H., Mitra, M., Parui, S.K., Järvelin, K.: GRAS: an effective and efficient stemming algorithm for information retrieval. ACM Trans. Inf. Syst. 29(4), 19 (2011)
Paik, J.H., Pal, D., Parui, S.K.: A novel corpus-based stemming algorithm using co-occurrence statistics. In: Proceedings of 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011), pp. 863–872. ACM Press (2011)
Paik, J.H., Parui, S.K.: A fast corpus-based stemmer. ACM Trans. Asian Lang. Inf. Process. 10(2), 1–16 (2011)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Savoy, J.: Searching strategies for the Hungarian language. Inf. Process. Manage. 44(1), 310–324 (2008)
Singh, J., Gupta, V.: Text stemming: approaches, applications, and challenges. ACM Comput. Surv. (CSUR) 49(3), 45:1–45:46 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Silvello, G. et al. (2018). Statistical Stemmers: A Reproducibility Study. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-76941-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)