[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1083592.1083631dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

The TEXTURE benchmark: measuring performance of text queries on a relational DBMS

Published: 30 August 2005 Publication History

Abstract

We introduce a benchmark called TEXTURE (TEXT Under RElations) to measure the relative strengths and weaknesses of combining text processing with a relational workload in an RDBMS. While the well-known TREC benchmarks focus on quality, we focus on efficiency. TEXTURE is a micro-benchmark for query workloads, and considers two central text support issues that previous benchmarks did not: (1) queries with relevance ranking, rather than those that just compute all answers, and (2) a richer mix of text and relational processing, reflecting the trend toward seamless integration. In developing this benchmark, we had to address the problem of generating large text collections that reflected the (performance) characteristics of a given "seed" collection; this is essential for a controlled study of specific data characteristics and their effects on performance. In addition to presenting the benchmark, with performance numbers for three commercial DBMSs, we present and validate a synthetic generator for populating text fields.

References

[1]
http://es.csiro.au/TRECWeb/vlc2info.html.
[2]
http://trec.nist.gov/.
[3]
http://www.tpc.org/.
[4]
R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. ACM Press / Addison-Wesley, 1999.
[5]
Z. Bi, C. Faloutsos, and F. Korn. The dgx distribution for mining massive, skewed data. In KDDM, pages 17--26, 2001.
[6]
E. Brown, J. Callan, and W. Croft. Fast incremental indexing for full-text information retrieval. In VLDB, pages 192--202, Santiago, Chille, September 1994.
[7]
E. W. Brown. Execution performance issues in full-text information retrieval. Technical Report UM-CS-1995-081, 1995.
[8]
W. B. Croft, L. A. Smith, and H. R. Turtle. A loosely-coupled integration of a text retrieval system and an object-oriented database system. In ACM SIGIR, pages 223--232. ACM Press, 1992.
[9]
S. DeFazio. Full-text document retrieval benchmark, chapter 8. Morgan Kaufmann, 2 edition, 1993.
[10]
D. DeWitt. The Wisconsin Benchmark: Past, Present, and Future, chapter 4. Morgan Kaufmann, 1991.
[11]
C. Faloutsos. Access methods for text. ACM Computing Surveys (CSUR), 17(1):47074, 1985.
[12]
C. Faloutsos and H. V. Jagadish. On b-tree indices for skewed distributions. In L.-Y. Yuan, editor, VLDB, pages 363--374. Morgan Kaufmann, 1992.
[13]
J. Gray. The Benchmark Handbook For Database and Transaction Processing Systems. Morgan Kaufmann, 1991.
[14]
J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger. Quickly generating billion-record synthetic databases. In R. T. Snodgrass and M. Winslett, editors, ACM SIGMOD, pages 243--252. ACM Press, 1994.
[15]
D. Hawking. Overview of trec-7 very large collection track. In E. M. Voorhees and D. K. Harman, editors, The Seventh Text REtrieval Conference, 1998.
[16]
H. S. Heaps. Information Retrieval, Computational and Theoretical Aspects. Academic Press, Inc., New York, 1978.
[17]
C. A. Lynch and M. Stonebraker. Extended user-defined indexing with application to textual databases. In F. Bancilhon and D. J. DeWitt, editors, VLDB, pages 306--317. Morgan Kaufmann, 1988.
[18]
C. Shannon. Prediction and entropy of printed English. Technical report, Bell Systems, 1951.
[19]
K. A. Shoens, A. Tomasic, and H. Garcia-Molina. Synthetic workload performance analysis of incremental updates. In Research and Development in Information Retrieval, pages 329--338, 1994.
[20]
M. Stonebraker, H. Stettner, N. Lynn, J. Kalash, and A. Guttman. Document processing in a relational database system. ACM Trans. Inf. Syst., 1(2):143--158, 1983.
[21]
A. Tomasic and H. Garcia-Molina. Performance of inverted indices in distributed text document retrieval systems. In PDIS, pages 8--17, 1993.
[22]
C. Turbyfill, C. Orji, and D. Bitton. As3ap - a comparative relational database benchmark. In Proc. IEEE Compcon, February 1989.
[23]
G. Zipf. Human Behaviour and the Principle of Least Effort: An Introduction to Human Ecology. Hafner Publications, 1949.

Cited By

View all
  • (2013)A performance comparison of parallel DBMSs and MapReduce on large-scale text analyticsProceedings of the 16th International Conference on Extending Database Technology10.1145/2452376.2452448(613-624)Online publication date: 18-Mar-2013
  • (2009)Benchmarking Fulltext Search Performance of RDF StoresProceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications10.1007/978-3-642-02121-3_10(81-95)Online publication date: 31-May-2009
  • (2008)Accessing speech documents on smartphonesProceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services10.4108/ICST.MOBIQUITOUS2008.3635(1-10)Online publication date: 21-Jul-2008

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '05: Proceedings of the 31st international conference on Very large data bases
August 2005
1392 pages
ISBN:1595931546

Publisher

VLDB Endowment

Publication History

Published: 30 August 2005

Qualifiers

  • Article

Conference

ICMI05

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2013)A performance comparison of parallel DBMSs and MapReduce on large-scale text analyticsProceedings of the 16th International Conference on Extending Database Technology10.1145/2452376.2452448(613-624)Online publication date: 18-Mar-2013
  • (2009)Benchmarking Fulltext Search Performance of RDF StoresProceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications10.1007/978-3-642-02121-3_10(81-95)Online publication date: 31-May-2009
  • (2008)Accessing speech documents on smartphonesProceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services10.4108/ICST.MOBIQUITOUS2008.3635(1-10)Online publication date: 21-Jul-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media