[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1807167.1807202acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Consistent query answers in inconsistent probabilistic databases

Published: 06 June 2010 Publication History

Abstract

Efficient and effective manipulation of probabilistic data has become increasingly important recently due to many real applications that involve the data uncertainty. This is especially crucial when probabilistic data collected from different sources disagree with each other and incur inconsistencies. In order to accommodate such inconsistencies and enable consistent query answering (CQA), in this paper, we propose the all-possible-repair semantics in the context of inconsistent probabilistic databases, which formalize the repairs on the database as repair worlds via a graph representation. In turn, the CQA problem can be converted into one in the so-called repaired possible worlds (w.r.t. both repair worlds and possible worlds). We investigate a series of consistent queries in inconsistent probabilistic databases, including consistent range queries, join, and top-k queries, which, however, need to deal with an exponential number of the repaired possible worlds at high cost. To tackle the efficiency problem of CQA, in this paper, we propose efficient approaches for retrieving consistent query answers, including effective pruning methods to filter out false positives. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our approaches.

References

[1]
D. J. Abadi, S. Madden, and W. Lindner. REED: Robust, efficient filtering and event detection in sensor networks. In VLDB, 2005.
[2]
P. Andritsos, A. Fuxman, and R. Miller. Clean answers over dirty databases: A probabilistic approach. In ICDE, 2006.
[3]
L. Antova, C. Koch, and D. Olteanu. MayBMS: Managing incomplete information with probabilistic world-set decompositions. In ICDE, 2007.
[4]
M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, 1999.
[5]
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles. In SIGMOD, 1990.
[6]
O. Benjelloun, A. Das Sarma, A. Y. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, 2006.
[7]
G. Beskales, M. A. Soliman, I. F. Ilyas, and S. Ben-David. Modeling and querying possible repairs in duplicate detection. PVLDB, 2(1), 2009.
[8]
P. Bohannon, W. Fan, M. Flaster, and R. Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD, 2005.
[9]
J. Boulos, N. N. Dalvi, B. Mandhani, S. Mathur, C. Ré, and D. Suciu. Mystiq: a system for finding more answers by using probabilities. In SIGMOD, 2005.
[10]
R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, 2003.
[11]
R. Cheng, S. Singh, and S. Prabhakar. U-DBMS: A database system for managing constantly-evolving data. In VLDB, 2005.
[12]
J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1/2), 2005.
[13]
G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: consistency and accuracy. In VLDB, 2007.
[14]
G. Cormode, F. Li, and K. Yi. Semantics of ranking queries for probabilistic data and expected ranks. In ICDE, 2009.
[15]
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4), 2007.
[16]
X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: The role of source dependence. PVLDB, 2(1), 2009.
[17]
X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. PVLDB, 2(1), 2009.
[18]
W. Fan. Dependencies revisited for improving data quality. In PODS, 2008.
[19]
I. Fellegi and D. Holt. A systematic approach to automatic edit and imputation. J. American Statistical Association, 71(353), 1976.
[20]
A. Fuxman, E. Fazli, and R. Miller. ConQuer: efficient management of inconsistent databases. In SIGMOD, 2005.
[21]
S. Greco and C. Molinaro. Approximate probabilistic query answering over inconsistent databases. In ER, 2008.
[22]
M. Hernandez, M. A. Hernandez, S. Stolfo, and U. Fayyad. Real-world data is dirty: Data cleansing and the merge/purge problem. In Data Min. Knowl. Discov., 1998.
[23]
M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: a probabilistic threshold approach. In SIGMOD, 2008.
[24]
Y. W. Huang, N. Jing, and E. A. Rundensteiner. Spatial joins using R-trees: breadth-first traversal with global optimizations. In VLDB, 1997.
[25]
R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD, 2008.
[26]
R. Kumar, K. Punera, T. Suel, and S. Vassilvitskii. Top-k aggregation using intersections of ranked inputs. In WSDM, 2009.
[27]
J. Li, B. Saha, and A. Deshpande. A unified approach to ranking in probabilistic databases. PVLDB, 2(1), 2009.
[28]
L. Mo, Y. He, Y. Liu, J. Zhao, S. Tang, X.-Y. Li, and G. Dai. Canopy closure estimates with greenorbs: Sustainable sensing in the forest. In ACM Sensys, 2009. http://greenorbs.org.
[29]
T. Neumann, M. Bender, S. Michel, R. Schenkel, P. Triantafillou, and G. Weikum. Distributed top-k aggregation queries at large. Distributed and Parallel Databases, 26(1), 2009.
[30]
R. A. Pottinger and P. A. Bernstein. Merging models based on given correspondences. In VLDB, 2003.
[31]
C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.
[32]
F. Sadri. Reliability of answers to queries in relational databases. TKDE, 3(2), 1991.
[33]
A. D. Sarma, M. Theobald, and J. Widom. Exploiting lineage for confidence computation in uncertain and probabilistic databases. In ICDE, 2008.
[34]
M. A. Soliman, I. F. Ilyas, and K. C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.
[35]
Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou. Branch-and-bound processing of ranked queries. Inf. Syst., 32(3), 2007.
[36]
Y. Theodoridis and T. Sellis. A model for the prediction of R-tree performance. In PODS, 1996.
[37]
D. Z. Wang, E. Michelakis, M. Garofalakis, and J. Hellerstein. Bayestore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, 2008.
[38]
J. Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3), 2005.
[39]
W. E. Winkler. Methods for evaluating and creating data quality. Inf. Syst., 29(7), 2004.
[40]
K. Yi, F. Li, G. Kollios, and D. Srivastava. Efficient processing of top-k queries in uncertain databases. In ICDE, 2008.

Cited By

View all
  • (2023)Query-Guided Resolution in Uncertain DatabasesProceedings of the ACM on Management of Data10.1145/35893251:2(1-27)Online publication date: 20-Jun-2023
  • (2023)An epistemic approach to model uncertainty in data-graphsInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.108948160(108948)Online publication date: Sep-2023
  • (2023)A further step for efficient corrections of inconsistent probabilistic data setsInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.108942159:COnline publication date: 26-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
June 2010
1286 pages
ISBN:9781450300322
DOI:10.1145/1807167
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. all-possible-repairs semantics
  2. consistent query answering
  3. inconsistent probabilistic database
  4. repair world
  5. repaired possible world

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '10
Sponsor:
SIGMOD/PODS '10: International Conference on Management of Data
June 6 - 10, 2010
Indiana, Indianapolis, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Query-Guided Resolution in Uncertain DatabasesProceedings of the ACM on Management of Data10.1145/35893251:2(1-27)Online publication date: 20-Jun-2023
  • (2023)An epistemic approach to model uncertainty in data-graphsInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.108948160(108948)Online publication date: Sep-2023
  • (2023)A further step for efficient corrections of inconsistent probabilistic data setsInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.108942159:COnline publication date: 26-Jul-2023
  • (2023)Distributed probabilistic top-k dominating queries over uncertain databasesKnowledge and Information Systems10.1007/s10115-023-01917-365:11(4939-4965)Online publication date: 1-Jul-2023
  • (2022)ActivePDBProceedings of the VLDB Endowment10.14778/3554821.355486315:12(3638-3641)Online publication date: 1-Aug-2022
  • (2022)Data Dependencies Extended for Variety and Veracity: A Family TreeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304644334:10(4717-4736)Online publication date: 1-Oct-2022
  • (2022)Efficient Processing of Group Planning Queries Over Spatial-Social NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.300415334:5(2135-2147)Online publication date: 1-May-2022
  • (2022)Database Repairing and Consistent Query Answering10.1007/978-3-031-01883-1Online publication date: 2-Mar-2022
  • (2021)Modeling and Querying Similar Trajectory in Inconsistent Spatial DataDatabase Systems for Advanced Applications. DASFAA 2021 International Workshops10.1007/978-3-030-73216-5_5(57-67)Online publication date: 11-Apr-2021
  • (2020)Topic-based community search over spatial-social networksProceedings of the VLDB Endowment10.14778/3407790.340781213:12(2104-2117)Online publication date: 14-Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media