[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Component Ranking and Automatic Query Refinement for XML Retrieval

  • Conference paper
Advances in XML Information Retrieval (INEX 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3493))

Abstract

Queries over XML documents challenge search engines to return the most relevant XML components that satisfy the query concepts. In a previous work we described a component ranking algorithm that performed relatively well in INEX’03. In this paper we show an improvement to that algorithm by introducing a document pivot that compensates for missing terms statistics in small components. Using this new algorithm we achieved improvements of 30%-50% in the Mean Average Precision over the previous algorithm. We then describe a general mechanism to apply known Query Refinement algorithms from traditional IR on top of this component ranking algorithm and demonstrate an example such algorithm that achieved top results in INEX’04.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Broder, A.Z., Maarek, Y., Mandelbrod, M., Mass, Y.: Using XML to Query XML – From Theory to Practice. In: Proceedings of RIAO 2004, Avignon France (April 2004)

    Google Scholar 

  2. Carmel, D., Farchi, E., Petruschka, Y., Soffer, A.: Automatic Query Refinement using Lexical Affinities with Maximal Information Gain. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2002)

    Google Scholar 

  3. Carmel, D., Maarek, Y., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML Documents via XML Fragments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada (August 2003)

    Google Scholar 

  4. INEX, Initiative for the Evaluation of XML Retrieval, http://inex.is.informatik.uni-duisburg.de

  5. INEX 2004 Participants area, http://inex.is.informatik.uni-duisburg.de:2004/internal/

  6. Mass, Y., Mandelbrod, M.: Retrieving the most relevant XML Component. In: Proceedings of the Second Workshop of the Initiative for The Evaluation of XML Retrieval (INEX), Schloss Dagstuhl, Germany, December 15-17, pp. 53–58 (2003)

    Google Scholar 

  7. Ruthven, I., Lalmas, M.: A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review 18(1) (2003)

    Google Scholar 

  8. Salton, G.: Automatic Text Processing – The Transformation, Analysis and Retrieval of Information by Computer. Addison Wesley Publishing Company, Reading (1989)

    Google Scholar 

  9. Sigurbjornsson, B., Kamps, J., Rijke, M.: An element based approach to XML Retrieval. In: Proceedings of the Second Workshop of the Initiative for The Evaluation of XML Retrieval (INEX), Schloss Dagstuhl, Germany, December 15-17, pp. 19–26 (2003)

    Google Scholar 

  10. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of SIGIR 1996, pp. 21–29 (1996)

    Google Scholar 

  11. XPath – XML Path Language (XPath) 2.0, http://www.w3.org/TR/xpath2

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mass, Y., Mandelbrod, M. (2005). Component Ranking and Automatic Query Refinement for XML Retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds) Advances in XML Information Retrieval. INEX 2004. Lecture Notes in Computer Science, vol 3493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424550_6

Download citation

  • DOI: https://doi.org/10.1007/11424550_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26166-7

  • Online ISBN: 978-3-540-32053-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics