Abstract
Recently, due to intrinsic characteristics in many underlying data sets, a number of probabilistic queries on uncertain data have been investigated. Top-k dominating queries are very important in many applications including decision making in a multidimensional space. In this paper, we study the problem of efficiently computing top-k dominating queries on uncertain data. We first formally define the problem. Then, we develop an efficient, threshold-based algorithm to compute the exact solution. To overcome some inherent computational deficiency in an exact computation, we develop an efficient randomized algorithm with an accuracy guarantee. Our extensive experiments demonstrate that both algorithms are quite efficient, while the randomized algorithm is quite scalable against data set sizes, object areas, k values, etc. The randomized algorithm is also highly accurate in practice.
Similar content being viewed by others
References
Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying sets of possible worlds. In: SIGMOD (1987)
Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S., Sugihara, T., Widom, J.: Trio: a system for data, uncertainty, and lineage. In: VLDB (2006)
Antova, L., Koch, C., Olteanu, D.: \({10^{10^6}}\) worlds and beyond: Efficient representation and processing of incomplete information. In: ICDE (2007)
Barbara D., Garcia-Molina H., Porter D.: The management of probabilistic data. IEEE TKDE 4(5), 487–502 (1992)
Bekales, G., Soliman, M.A., Ilyas, I.F.: Efficient search for the top-k probable nearest neighbors in uncertain databases In: VLDB (2008)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001)
Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using r-trees. In: SIGMOD (1993)
Chan, C., Jagadish, H., Tan, K.-L., Tung, A.K., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: SIGMOD (2006)
Cheng, R., Chen, J., Mokbel, M.F., Chow, C.-Y.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: ICDE (2008)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)
Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB (2004)
Cormen T.H., Leiserson C.E., Rivest R.L., Stein C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)
Dalvi, N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS (2007)
Dalvi, N.N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS (2007)
Dubhashi, D., Panconesi, A.: Concentration of measure for the analysis of randomised algorithms, p. 12. http://citeseer.ist.psu.edu/old/dubhashi98concentration.html (1998)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Fagin R., Lotem A., Naor M.: Optimal aggregation algorithms for middleware. JCSS 66, 614–656 (2003)
Gilks W., Richardson S., Spiegelhalter D.: Markov Chain Monte Carlo in Practice. Chapman & Hall, London (1996)
Goldreich, O.: Randomized Methods in Computation, Lecture 2. http://www.wisdom.weizmann.ac.il/~oded/rnd.html (2001)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD (1984)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: A probabilistic threshold approach. In: SIGMOD (2008)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD (2008)
Imielinski T., Lipski W.: Incomplete information in relational databases. JACM 31(4), 761–791 (1984)
Kalos M.H., Whitlock P.A.: Monte Carlo Methods. Wiley Interscience, London (1986)
Kriegel, H.P., Kunath, P., Pfeifle, M., Renz, M.: Probabilistic similarity join on uncertain data. In: DASFAA (2006)
Kriegel, H.P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: DASFAA (2007)
Kriegel, H.P., Pfeifle, M.: Density-based clustering of uncertain data. In: KDD (2005)
Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S.: Probview: a flexible probabilistic database system. ACM TODS 22(3), 419–469 (1997)
Ngai, W.K., Kao, B., Cheng, C.K.C.R., Chau, M., Yip, K.Y.: Efficient clustering of uncertain data. In: ICDM (2006)
Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient olap operations in spatial data warehouses. In: SSTD (2001)
Papadias, D., Mamoulis, N., Theodoridis, Y.: Processing and optimization of multiway spatial joins using R-trees. In: PODS (1999)
Papadias D., Tao Y., Greg F., Seeger B.: Progressive skyline computation in database systems. ACM TODS 30(1), 41–82 (2003)
Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skyline on uncertain data. In: VLDB (2007)
Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)
Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE (2005)
Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: ICDE (2007)
Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: ICDE (2007)
Soliman, M.A., Ilyas, I.F., Chang, K.C.: Top-k query processing in uncertain databases. In: ICDE (2007)
Tan, K.-L., Eng, P.-K., Ooi, B.C.: Efficient progressive skyline computation. In: VLDB (2001)
Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB (2005)
Yi K., Li F., Kollios G., Srivastava D.: Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Trans. Knowl. Data Eng. 20(12), 1669–1682 (2008)
Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: VLDB (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, W., Lin, X., Zhang, Y. et al. Threshold-based probabilistic top-k dominating queries. The VLDB Journal 19, 283–305 (2010). https://doi.org/10.1007/s00778-009-0162-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-009-0162-1