TopX 2.0 at the INEX 2008 Efficiency Track

Martin Theobald¹⁹,
Mohammed AbuJarour²¹ &
Ralf Schenkel^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5631))

Included in the following conference series:

International Workshop of the Initiative for the Evaluation of XML Retrieval

403 Accesses
1 Citations

Abstract

For the INEX Efficiency Track 2008, we were just on time to finish and evaluate our brand-new TopX 2.0 prototype. Complementing our long-running effort on efficient top-k query processing on top of a relational back-end, we now switched to a compressed object-oriented storage for text-centric XML data with direct access to customized inverted files, along with a complete reimplementation of the engine in C++. Our INEX 2008 experiments demonstrate efficiency gains of up to a factor of 30 compared to the previous Java/JDBC-based TopX 1.0 implementation over a relational back-end. TopX 2.0 achieves overall runtimes of less than 51 seconds for the entire batch of 568 Efficiency Track topics in their content-and-structure (CAS) version and less than 29 seconds for the content-only (CO) version, respectively, using a top-15, focused (i.e., non-overlapping) retrieval mode—an average of merely 89 ms per CAS query and 49 ms per CO query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Compact Indexes for Flexible Top- $$k$$ Retrieval

From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance

The Open Web Index

References

Bast, H., Majumdar, D., Theobald, M., Schenkel, R., Weikum, G.: IO-Top-k: Index-optimized top-k query processing. In: VLDB, pp. 475–486 (2006)
Google Scholar
Broschart, A., Schenkel, R., Theobald, M., Weikum, G.: TopX @ INEX 2007. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 49–56. Springer, Heidelberg (2008)
Chapter Google Scholar
Clarke, C.L.A.: Controlling overlap in content-oriented XML retrieval. In: Baeza-Yates, R.A., Ziviani, N., Marchionini, G., Moffat, A., Tait, J. (eds.) SIGIR, pp. 314–321. ACM Press, New York (2005)
Google Scholar
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. In: SIGIR Forum (2006)
Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS. ACM Press, New York (2001)
Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Article MathSciNet MATH Google Scholar
Grust, T.: Accelerating XPath location steps. In: Franklin, M.J., Moon, B., Ailamaki, A. (eds.) SIGMOD Conference, pp. 109–120. ACM Press, New York (2002)
Google Scholar
Helmer, S., Neumann, T., Moerkotte, G.: A robust scheme for multilevel extendible hashing. In: Computer and Information Sciences - 18th International Symposium (ISCIS), pp. 220–227 (2003)
Google Scholar
Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: TopX: efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1), 81–115 (2008)
Article Google Scholar
Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for TopX search. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P.-Å., Ooi, B.C. (eds.) VLDB, pp. 625–636. ACM Press, New York (2005)
Google Scholar
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: WWW ’08: Proceeding of the 17th international conference on World Wide Web, pp. 387–396. ACM Press, New York (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Informatics, Saarbrücken, Germany
Martin Theobald & Ralf Schenkel
Saarland University, Saarbrücken, Germany
Ralf Schenkel
Hasso Plattner Institute, Potsdam, Germany
Mohammed AbuJarour

Authors

Martin Theobald
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed AbuJarour
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Schenkel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science and Technology, Queensland University of Technology, GPO Box 2434, 4001, Brisband, Qld, Australia
Shlomo Geva
Archives and Information Studies/Humanities, University of Amsterdam, Turfdraagsterpad 9, 1012 XT, Amsterdam, The Netherlands
Jaap Kamps
Department of Computer Science, University of Otago, P.O. Box 56, 9054, Dunedin, New Zealand
Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Theobald, M., AbuJarour, M., Schenkel, R. (2009). TopX 2.0 at the INEX 2008 Efficiency Track. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-03761-0_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03760-3
Online ISBN: 978-3-642-03761-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics