Abstract
For the INEX Efficiency Track 2008, we were just on time to finish and evaluate our brand-new TopX 2.0 prototype. Complementing our long-running effort on efficient top-k query processing on top of a relational back-end, we now switched to a compressed object-oriented storage for text-centric XML data with direct access to customized inverted files, along with a complete reimplementation of the engine in C++. Our INEX 2008 experiments demonstrate efficiency gains of up to a factor of 30 compared to the previous Java/JDBC-based TopX 1.0 implementation over a relational back-end. TopX 2.0 achieves overall runtimes of less than 51 seconds for the entire batch of 568 Efficiency Track topics in their content-and-structure (CAS) version and less than 29 seconds for the content-only (CO) version, respectively, using a top-15, focused (i.e., non-overlapping) retrieval mode—an average of merely 89 ms per CAS query and 49 ms per CO query.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bast, H., Majumdar, D., Theobald, M., Schenkel, R., Weikum, G.: IO-Top-k: Index-optimized top-k query processing. In: VLDB, pp. 475–486 (2006)
Broschart, A., Schenkel, R., Theobald, M., Weikum, G.: TopX @ INEX 2007. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 49–56. Springer, Heidelberg (2008)
Clarke, C.L.A.: Controlling overlap in content-oriented XML retrieval. In: Baeza-Yates, R.A., Ziviani, N., Marchionini, G., Moffat, A., Tait, J. (eds.) SIGIR, pp. 314–321. ACM Press, New York (2005)
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. In: SIGIR Forum (2006)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS. ACM Press, New York (2001)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Grust, T.: Accelerating XPath location steps. In: Franklin, M.J., Moon, B., Ailamaki, A. (eds.) SIGMOD Conference, pp. 109–120. ACM Press, New York (2002)
Helmer, S., Neumann, T., Moerkotte, G.: A robust scheme for multilevel extendible hashing. In: Computer and Information Sciences - 18th International Symposium (ISCIS), pp. 220–227 (2003)
Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: TopX: efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1), 81–115 (2008)
Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for TopX search. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P.-Å., Ooi, B.C. (eds.) VLDB, pp. 625–636. ACM Press, New York (2005)
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: WWW ’08: Proceeding of the 17th international conference on World Wide Web, pp. 387–396. ACM Press, New York (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Theobald, M., AbuJarour, M., Schenkel, R. (2009). TopX 2.0 at the INEX 2008 Efficiency Track. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-03761-0_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03760-3
Online ISBN: 978-3-642-03761-0
eBook Packages: Computer ScienceComputer Science (R0)