Abstract
This paper addresses the problem of identifying collection dependent stop-words in order to reduce the size of inverted files. We present four methods to automatically recognise stop-words, analyse the tradeoff between efficiency and effectiveness, and compare them with a previous pruning approach. The experiments allow us to conclude that in some situations stop-words pruning is competitive with respect to other inverted file reduction techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bahle, D., Williams, H., Zobel, J.: Efficient phrase querying with an auxiliary index. In: Proc. of ACM SIGIR, pp. 215–221. ACM Press, New York (2002)
Carmel, D., et al.: Static index pruning for information retrieval systems. In: Proc. of ACM SIGIR, pp. 43–50. ACM Press, New York (2001)
Church, K., Gale, W.: Poisson mixtures. Natural Language Engineering 2(1), 163–190 (1995)
de Moura, E.S., et al.: Improving web search efficiency via a locality based static pruning method. In: Proc. of WWW, pp. 235–244 (2005)
Fox, C.: A stop list for general text. SIGIR Forum 24(1-2), 19–21 (1990)
Lo, R.T.W., He, B., Ounis, I.: Automatically building a stopword list for an information retrieval system. In: Proc. of DIR’05, Utrecht, Netherlands (2005)
Moffat, A., Turpin, A.: Compression and Coding Algorithms. Kluwer Academic Publishers, Norwell (2002)
Rennie, J.D.M., Jaakkola, T.: Using term informativeness for named entity detection. In: Proc. of ACM SIGIR, pp. 353–360. ACM Press, New York (2005)
Robertson, S., Sparck Jones, K.: Relevance weighting of search terms. JASIS 27, 129–146 (1976)
Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. In: Text REtrieval Conference, pp. 151–162 (2000)
Robertson, S.E., et al.: Okapi at TREC-4. In: Text REtrieval Conference, pp. 21–30 (1996)
Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. JASIS 26(1), 33–44 (1975)
Turtle, H., Flood, J.: Query evaluation: Strategies and optimizations. IP&M 31(6), 831–850 (1995)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Blanco, R., Barreiro, Á. (2007). Static Pruning of Terms in Inverted Files. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-71496-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)