[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1514894.1514913acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

A compositional framework for complex queries over uncertain data

Published: 23 March 2009 Publication History

Abstract

The ability to flexibly compose confidence computation with the operations of relational algebra is an important feature of probabilistic database query languages. Computing confidences is computationally hard, however, and has to be approximated in practice. In a compositional query language, even very small errors caused by approximation can lead to an entirely incorrect result: A selection operation on an approximated probability can incorrectly keep or drop a tuple even if the probability value has been approximated to a very narrow confidence interval.
In this paper, we study the query evaluation problem for compositional query languages for probabilistic databases with particular focus on providing overall result quality guarantees in the face of approximate intermediate results. We present a framework for evaluating compositional queries based on a new representation system that can capture uncertainty about probabilities. More specifically, we consider probability intervals instead of exact probabilities, interpreting tuples obtained by selection on approximate values as unreliable.
We study the complexity of query evaluation over our new model. We present efficient confidence computation algorithms which compute bounds that are close to tight for important classes. For deciding a selection predicate, we show that no efficient randomized algorithm exists unless BPP⊃NP. Still we are able to efficiently guess robust predicates with a good error bound. Putting all these pieces together in our framework, we evaluate queries using a decomposition into a relational algebra plan and an approximation plan. The latter allows to successively improve accuracy and error bounds, while the relational algebra plan only has to be executed once.

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
Periklis Andritsos, Ariel Fuxman, and Renee J. Miller. Clean answers over dirty databases: A probabilistic approach. In Proc. ICDE, 2006.
[3]
Lyublena Antova, Thomas Jansen, Christoph Koch, and Dan Olteanu. "Fast and Simple Relational Processing of Uncertain Data". In Proc. ICDE, 2008.
[4]
Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. "Conditional Functional Dependencies for Data Cleaning". In Proc. ICDE, 2007.
[5]
Sara Cohen, Benny Kimelfeld, and Yehoshua Sagiv. "Incorporating Constraints In Probabilistic XML". In Proc. PODS, 2008.
[6]
Nilesh Dalvi and Dan Suciu. "Efficient Query Evaluation on Probabilistic Databases". In Proc. VLDB, 2004.
[7]
Norbert Fuhr and Thomas Rölleke. "A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems". ACM Trans. Inf. Syst., 15(1), 1997.
[8]
Erich Grädel, Yuri Gurevich, and Colin Hirsch. "The Complexity of Query Reliability". In Proc. PODS, 1998.
[9]
Edward Hung, Lise Getoor, and V. S. Subrahmanian. "Probabilistic Interval XML". In Proc. ICDT, 2003.
[10]
T. Imielinski and W. Lipski. "Incomplete Information in Relational Databases". J. ACM, 31(4), 1984.
[11]
Abhay Jha, Vibhor Rastogi, and Dan Suciu. "Query Evaluation with Soft-Key Constraints". In Proc. PODS, 2008.
[12]
David S. Johnson. A catalog of complexity classes. Handbook of theoretical computer science (vol. A): algorithms and complexity, pages 67--161, 1990.
[13]
Richard M. Karp and Michael Luby. "Monte-Carlo Algorithms for Enumeration and Reliability Problems". In Proc. FOCS, 1983.
[14]
Christoph Koch. "Approximating Predicates and Expressive Queries on Probabilistic Data". In Proc. PODS, 2008.
[15]
Christoph Koch and Dan Olteanu. "Conditioning Probabilistic Databases". In Proc. VLDB, 2008.
[16]
Laks V. S. Lakshmanan, Nicola Leone, Robert B. Ross, and V. S. Subrahmanian. Probview: A flexible probabilistic database system. ACM Trans. Database Syst., 22(3):419--469, 1997.
[17]
Christopher Ré, Julie Letchner, Magdalena Balazinska, and Dan Suciu. "Event Queries on Correlated Probabilistic Streams". In Proc. SIGMOD, 2008.
[18]
Luca Trevisan. A note on approximate counting for k-dnf. In Proc. APPROX-RANDOM, 2004.
[19]
Leslie G. Valiant. "The Complexity of Enumeration and Reliability Problems". SIAM J. Comput., 8(3), 1979.
[20]
Jennifer Widom. "TRIO: A System for Managing Data, Uncertainty, and Lineage". In Charu Agarwal, editor, Managing and Mining Uncertain Data, 2008. To appear.
[21]
Wenzhong Zhao, Alex Dekhtyar, and Judy Goldsmith. Query algebra operations for interval probabilities. In Proc. DEXA, 2003.
[22]
Wenzhong Zhao, Alex Dekhtyar, and Judy Goldsmith. Databases for interval probabilities. Int. J. Intell. Syst., 19(9):789--815, 2004.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICDT '09: Proceedings of the 12th International Conference on Database Theory
March 2009
334 pages
ISBN:9781605584232
DOI:10.1145/1514894
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 March 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT/ICDT '09
EDBT/ICDT '09: EDBT/ICDT '09 joint conference
March 23 - 25, 2009
St. Petersburg, Russia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)5
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media