[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ICDE.2009.75guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

Published: 29 March 2009 Publication History

Abstract

When dealing with massive quantities of data, top-k queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the top-k is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a top-k over deterministic data. Specifically, we define a number of fundamental properties, including exact-k, containment, unique-rank, value-invariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the well-founded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty. For an uncertain relation of N tuples, the processing cost is O(N logN)—no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the top-k has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach.

Cited By

View all
  • (2021)TQELProceedings of the VLDB Endowment10.14778/3476249.347630914:11(2642-2654)Online publication date: 27-Oct-2021
  • (2021)Marrying Top-k with Skyline Queries: Relaxing the Preference Input while Producing Output of Controllable SizeProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457299(1317-1330)Online publication date: 9-Jun-2021
  • (2021)Top-K Deep Video AnalyticsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452786(1037-1050)Online publication date: 9-Jun-2021
  • Show More Cited By

Index Terms

  1. Semantics of Ranking Queries for Probabilistic Data and Expected Ranks
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      ICDE '09: Proceedings of the 2009 IEEE International Conference on Data Engineering
      March 2009
      1772 pages
      ISBN:9780769535456

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 29 March 2009

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)TQELProceedings of the VLDB Endowment10.14778/3476249.347630914:11(2642-2654)Online publication date: 27-Oct-2021
      • (2021)Marrying Top-k with Skyline Queries: Relaxing the Preference Input while Producing Output of Controllable SizeProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457299(1317-1330)Online publication date: 9-Jun-2021
      • (2021)Top-K Deep Video AnalyticsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452786(1037-1050)Online publication date: 9-Jun-2021
      • (2019)Representative Query Answers on Uncertain DataProceedings of the 16th International Symposium on Spatial and Temporal Databases10.1145/3340964.3340974(140-149)Online publication date: 19-Aug-2019
      • (2019)Region-Based Approximation of Probability Distributions (for Visibility Between Imprecise Points Among Obstacles)Algorithmica10.1007/s00453-019-00551-281:7(2682-2715)Online publication date: 1-Jul-2019
      • (2018)Exact processing of uncertain top-k queries in multi-criteria settingsProceedings of the VLDB Endowment10.14778/3204028.320403111:8(866-879)Online publication date: 1-Apr-2018
      • (2018)Range-max queries on uncertain dataJournal of Computer and System Sciences10.1016/j.jcss.2017.09.00694:C(118-134)Online publication date: 1-Jun-2018
      • (2017)Evaluating flexible criteria on uncertain dataFuzzy Sets and Systems10.1016/j.fss.2017.07.012328:C(122-140)Online publication date: 1-Dec-2017
      • (2017)Nearest-Neighbor Searching Under Uncertainty IDiscrete & Computational Geometry10.1007/s00454-017-9903-x58:3(705-745)Online publication date: 1-Oct-2017
      • (2016)Range-Max Queries on Uncertain DataProceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/2902251.2902281(465-476)Online publication date: 15-Jun-2016
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media