[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1183614.1183719acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Efficient join processing over uncertain data

Published: 06 November 2006 Publication History

Abstract

In many applications data values are inherently uncertain. This includes moving-objects, sensors and biological databases. There has been recent interest in the development of database management systems that can handle uncertain data. Some proposals for such systems include attribute values that are uncertain. In particular, an attribute value can be modeled as a range of possible values, associated with a probability density function. Previous efforts for this type of data have only addressed simple queries such as range and nearest-neighbor queries. Queries that join multiple relations have not been addressed in earlier work despite the significance of joins in databases. In this paper we address join queries over uncertain data. We propose a semantics for the join operation, define probabilistic operators over uncertain data, and propose join algorithms that provide efficient execution of probabilistic joins. The paper focuses on an important class of joins termed probabilistic threshold joins that avoid some of the semantic complexities of dealing with uncertain data. For this class of joins we develop three sets of optimization techniques: item-level, page-level, and index-level pruning. These techniques facilitate pruning with little space and time overhead, and are easily adapted to most join algorithms. We verify the performance of these techniques experimentally.

References

[1]
R. Cheng, D. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In Proc. SIGMOD, 2003.
[2]
R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. Vitter. Efficient indexing methods for probabilistic threshold queries over uncertain data. In Proc. VLDB, 2004.
[3]
R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. S. Vitter. Efficient join processing over uncertain data. Technical Report CSD TR#05-004, Dept. of CS, Purdue University, 2005.
[4]
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In Proc. VLDB, 2004.
[5]
A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In Proc. VLDB, 2004.
[6]
D. Pfoser and C. Jensen. Capturing the uncertainty of moving-objects representations. In Proc. SSDBM, 1999.
[7]
J. Enderle, M. Hampel, and T. Seidl. Joining interval data in relational databases. In Proc. SIGMOD, 2004.
[8]
H. Gunadhi and A. Segev. Query processing algorithms for temporal intersection joins. In Proc. ICDE, 1991.
[9]
E. Hung, L. Getoor, and V. S. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In ICDE, 2003.
[10]
The Lowell Database Research Self-Assessment Meeting. Lowell Massachusetts. May 2003.
[11]
T. Mitchell. Machine Learning. McGraw Hill, 1997.
[12]
A. Nierman and H. V. Jagadish. ProTDB: Probabilistic Data in XML. In VLDB, 2002.
[13]
M. Soo, R. Snodgrass, and C. Jensen. Efficient evaluation of the valid-time natural join. In Proc. ICDE, 1994.
[14]
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In Proc. CIDR, 2005.
[15]
O. Wolfson, P. Sistla, S. Chamberlain, and Y. Yesha. Updating and querying databases that track mobile units. Distributed and Parallel Databases, 7(3), 1999.
[16]
A. Yazici, A. Soysal, B. Buckles, and F. Petry. Uncertainty in a nested relational database model. Elsevier Data and Knowledge Engineering, 30, 1999.
[17]
D. Zhang, V. Tsotras, and B. Seeger. Efficient temporal join processing using indicies. In Proc. ICDE, 2002.

Cited By

View all
  • (2022)Taming System Dynamics on Resource Optimization for Data Processing Workflows: A Probabilistic ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309140033:1(231-248)Online publication date: 1-Jan-2022
  • (2021)Efficiently Answering Durability Prediction QueriesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457305(591-604)Online publication date: 9-Jun-2021
  • (2021)Analysis of Associative Protected Spatio-Temporal Data2021 International Russian Automation Conference (RusAutoCon)10.1109/RusAutoCon52004.2021.9537478(531-536)Online publication date: 5-Sep-2021
  • Show More Cited By

Index Terms

  1. Efficient join processing over uncertain data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
    November 2006
    916 pages
    ISBN:1595934332
    DOI:10.1145/1183614
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. imprecise data
    2. joins
    3. uncertainty management

    Qualifiers

    • Article

    Conference

    CIKM06
    CIKM06: Conference on Information and Knowledge Management
    November 6 - 11, 2006
    Virginia, Arlington, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 16 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Taming System Dynamics on Resource Optimization for Data Processing Workflows: A Probabilistic ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309140033:1(231-248)Online publication date: 1-Jan-2022
    • (2021)Efficiently Answering Durability Prediction QueriesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457305(591-604)Online publication date: 9-Jun-2021
    • (2021)Analysis of Associative Protected Spatio-Temporal Data2021 International Russian Automation Conference (RusAutoCon)10.1109/RusAutoCon52004.2021.9537478(531-536)Online publication date: 5-Sep-2021
    • (2021)In-Memory Interval JoinsThe VLDB Journal10.1007/s00778-020-00639-0Online publication date: 8-Apr-2021
    • (2020)A survey of uncertain data managementFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-7063-z14:1(162-190)Online publication date: 1-Feb-2020
    • (2019)Scalable temporal clique enumerationProceedings of the 16th International Symposium on Spatial and Temporal Databases10.1145/3340964.3340987(120-129)Online publication date: 19-Aug-2019
    • (2019)Incorporating Probabilistic Optimizations for Resource Provisioning of Data Processing WorkflowsProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337847(1-10)Online publication date: 5-Aug-2019
    • (2018)Indexing Uncertain DataEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_80740(1900-1907)Online publication date: 7-Dec-2018
    • (2018)Probabilistic Spatial QueriesEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_276(2847-2852)Online publication date: 7-Dec-2018
    • (2018)Data Uncertainty Management in Sensor NetworksEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_115(863-869)Online publication date: 7-Dec-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media