[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3452021.3458309acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Benchmarking Approximate Consistent Query Answering

Published: 20 June 2021 Publication History

Abstract

Consistent query answering (CQA) aims to deliver meaningful answers when queries are evaluated over inconsistent databases. Such answers must be certainly true in all repairs, which are consistent databases whose difference from the inconsistent one is somehow minimal. Although CQA provides a clean framework for querying inconsistent databases, it is arguably more informative to compute the percentage of repairs in which a candidate answer is true, instead of simply saying that is true in all repairs, or is false in at least one repair. It should not be surprising, though, that computing this percentage is computationally hard. On the other hand, for practically relevant settings such as conjunctive queries and primary keys, there are data-efficient randomized approximation schemes for approximating this percentage. Our goal is to perform a thorough experimental evaluation and comparison of those approximation schemes. Our analysis provides new insights on which technique is indicated depending on key characteristics of the input, and it further provides evidence that making approximate CQA as described above feasible in practice is not an unrealistic goal.

References

[1]
Lyublena Antova, Thomas Jansen, Christoph Koch, and Dan Olteanu. 2008. Fast and Simple Relational Processing of Uncertain Data. In ICDE. 983--992.
[2]
Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. 1999. Consistent Query Answers in Inconsistent Databases. In PODS. 68--79.
[3]
Patricia C. Arocena, Boris Glavic, Giansalvatore Mecca, René e J. Miller, Paolo Papotti, and Donatello Santoro. 2015. Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms. PVLDB, Vol. 9, 2 (2015), 36--47.
[4]
Michael Benedikt, George Konstantinidis, Giansalvatore Mecca, Boris Motik, Paolo Papotti, Donatello Santoro, and Efthymia Tsamoura. 2017. Benchmarking the Chase. In PODS. 37--52.
[5]
George Beskales, Ihab F. Ilyas, and Lukasz Golab. 2010. Sampling the Repairs of Functional Dependency Violations under Hard Constraints. PVLDB, Vol. 3, 1 (2010), 197--207.
[6]
Philip Bohannon, Michael Flaster, Wenfei Fan, and Rajeev Rastogi. 2005. A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification. In SIGMOD. 143--154.
[7]
Marco Calautti, Marco Console, and Andreas Pieris. 2019. Counting Database Repairs under Primary Keys Revisited. In PODS. 104--118.
[8]
Marco Calautti, Leonid Libkin, and Andreas Pieris. 2018. An Operational Approach to Consistent Query Answering. In PODS. 239--251.
[9]
Paul Dagum, Richard M. Karp, Michael Luby, and Sheldon M. Ross. 2000. An Optimal Algorithm for Monte Carlo Estimation. SIAM J. Comput., Vol. 29, 5 (2000), 1484--1496.
[10]
Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, and Nan Tang. 2013. NADEEF: A commodity data cleaning system. In SIGMOD. 541--552.
[11]
Nilesh N. Dalvi and Dan Suciu. 2007. Management of probabilistic data: foundations and challenges. In PODS. 1--12.
[12]
Akhil A. Dixit and Phokion G. Kolaitis. 2019. A SAT-Based System for Consistent Query Answering. In SAT. 117--135.
[13]
Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2008. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst., Vol. 33, 2 (2008), 6:1--6:48.
[14]
Ariel Fuxman, Elham Fazli, and René e J. Miller. 2005. ConQuer: Efficient Management of Inconsistent Databases. In SIGMOD. 155--166.
[15]
Richard M. Karp and Michael Luby. 1983. Monte-Carlo Algorithms for Enumeration and Reliability Problems. In FOCS . 56--64.
[16]
Richard M. Karp, Michael Luby, and Neal Madras. 1989. Monte-Carlo Approximation Algorithms for Enumeration Problems. J. Algorithms, Vol. 10, 3 (1989), 429--448.
[17]
Phokion G. Kolaitis, Enela Pema, and Wang-Chiew Tan. 2013. Efficient Querying of Inconsistent Databases with Binary Integer Programming. PVLDB, Vol. 6, 6 (2013), 397--408.
[18]
Paraschos Koutris and Dan Suciu. 2014. A Dichotomy on the Complexity of Consistent Query Answering for Atoms with Simple Keys. In ICDT . 165--176.
[19]
Paraschos Koutris and Jef Wijsen. 2015. The Data Complexity of Consistent Query Answering for Self-Join-Free Conjunctive Queries Under Primary Key Constraints. In PODS. 17--29.
[20]
Paraschos Koutris and Jef Wijsen. 2019. Consistent Query Answering for Primary Keys in Logspace. In 22nd International Conference on Database Theory, ICDT 2019, March 26--28, 2019, Lisbon, Portugal . 23:1--23:19.
[21]
Paraschos Koutris and Jef Wijsen. 2020. First-Order Rewritability in Consistent Query Answering with Respect to Multiple Keys. In PODS. 113--129.
[22]
Dany Maslowski and Jef Wijsen. 2013. A dichotomy in the complexity of counting database repairs. J. Comput. Syst. Sci., Vol. 79, 6 (2013), 958--983.
[23]
Dany Maslowski and Jef Wijsen. 2014. Counting Database Repairs that Satisfy Conjunctive Queries with Self-Joins. In ICDT . 155--164.
[24]
Makoto Matsumoto and Takuji Nishimura. 1998. Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ACM Trans. Model. Comput. Simul., Vol. 8, 1 (1998), 3--30.
[25]
Kuldeep S. Meel, Aditya A. Shrotri, and Moshe Y. Vardi. 2019. Not all FPRASs are equal: Demystifying FPRASs for DNF-counting. Constraints, Vol. 24, 3--4 (2019), 211--233.
[26]
Balder ten Cate, Gaë lle Fontaine, and Phokion G. Kolaitis. 2015. On the Data Complexity of Consistent Query Answering. Theory Comput. Syst., Vol. 57, 4 (2015), 843--891.
[27]
Vijay V. Vazirani. 2003. Approximation Algorithms .

Cited By

View all
  • (2024)Computing Range Consistent Answers to Aggregation Queries via RewritingProceedings of the ACM on Management of Data10.1145/36958362:5(1-19)Online publication date: 7-Nov-2024
  • (2024)Combined Approximations for Uniform Operational Consistent Query AnsweringProceedings of the ACM on Management of Data10.1145/36516002:2(1-16)Online publication date: 14-May-2024
  • (2024)Consistent Query Answering for Primary Keys on Rooted Tree QueriesProceedings of the ACM on Management of Data10.1145/36511392:2(1-26)Online publication date: 14-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
June 2021
440 pages
ISBN:9781450383813
DOI:10.1145/3452021
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conjunctive queries
  2. consistent query answering
  3. efficient approximations
  4. inconsistent data
  5. primary keys

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Computing Range Consistent Answers to Aggregation Queries via RewritingProceedings of the ACM on Management of Data10.1145/36958362:5(1-19)Online publication date: 7-Nov-2024
  • (2024)Combined Approximations for Uniform Operational Consistent Query AnsweringProceedings of the ACM on Management of Data10.1145/36516002:2(1-16)Online publication date: 14-May-2024
  • (2024)Consistent Query Answering for Primary Keys on Rooted Tree QueriesProceedings of the ACM on Management of Data10.1145/36511392:2(1-26)Online publication date: 14-May-2024
  • (2023)LinCQA: Faster Consistent Query Answering with Linear Time GuaranteesProceedings of the ACM on Management of Data10.1145/35887181:1(1-25)Online publication date: 30-May-2023
  • (2023)Querying Data Exchange Settings Beyond Positive QueriesTheory and Practice of Logic Programming10.1017/S147106842300033924:2(250-278)Online publication date: 15-Aug-2023
  • (2022)DEGAIN: Generative-Adversarial-Network-Based Missing Data ImputationInformation10.3390/info1312057513:12(575)Online publication date: 12-Dec-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media