Abstract
In many applications including multicriteria decision making, top-k dominating query is a practically useful tool to return k tuples with the highest domination scores in a potentially huge data space. The existing algorithms, either requiring indexes built on the specific attribute subset, or incurring high I/O cost or memory cost, cannot process top-k dominating query on massive data efficiently. In this paper, a novel algorithm TDEP is proposed to utilize sorted lists built for each attribute with low cost to return top-k dominating result on massive data efficiently. Through analysis, it is found that TDEP can be divided into two phases: growing phase and shrinking phase. In each phase, TDEP retrieves the sorted lists in round-robin fashion and maintains the candidates until the stop condition is satisfied. The theoretical analysis is provided for the execution behavior in two phases. An efficient method is developed to compute the domination scores of tuples with the obtained candidates only. Besides, TDEP adopts early pruning to reduce the number of candidate tuples maintained significantly. The extensive experimental results, conducted on synthetic and real-life data sets, show the significant performance advantage of TDEP over the existing algorithms.
Similar content being viewed by others
Notes
In this paper, we consider that the tuples are in general position [24].
In the following section, we will see that the practical probability is even much higher.
References
Akbarinia R, Pacitti E, Valduriez P (2007) Best position algorithms for top-k queries. In: Proceedings of the 33rd international conference on very large data bases, pp 495–506
Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430
Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data engineering, pp 717–719
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 102–113
Fagin R, Lotem A, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656
Fan H, Zaïane O, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1):31–51
Feller W (1968) An introduction to probability theory, vol 1, 3rd edn. Wiley, New York
Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28
Güntzer U, Balke W, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th international conference on very large data bases, pp 419–428
Güntzer U, Balke W, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the international conference on information technology: coding and computing, pp 622–628
Han X, Li J, Yang D (2011) Supporting early pruning in top-k query processing on massive data. Inf Process Lett 111(11):524–532
Han X, Li J, Wang J, Yang D (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535
Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384
Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183
Ilyas I, Beskales G, Soliman M (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11
Kontaki M, Papadopoulos A, Manolopoulos Y (2012) Continuous top-k dominating queries. IEEE Trans Knowl Data Eng 24(5):840–853
Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data bases, pp 275–286
Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 660–671
Mamoulis N, Yiu M, Cheng K, Cheung D (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3):19
Pang H, Ding X, Zheng B (2010) Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J 19(3):437–456
Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82
Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86
Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116
Skoutas D, Sacharidis D, Simitsis A, Kantere V, Sellis T (2009) Top-k dominant web services under multi-criteria matching. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 898–909
Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606
Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data bases, pp 301–310
Tiakas E, Papadopoulos A, Manolopoulos Y (2011) Progressive processing of subspace dominating queries. VLDB J 20(6):921–948
Yang B, Huang H (2010) Topsil-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225–242
Yiu M, Mamoulis N (2007) Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd international conference on very large data, bases, pp 483–494
Yiu M, Mamoulis N (2009) Multi-dimensional top-k dominating queries. VLDB J 18(3):695–718
Zhang W, Lin X, Zhang Y, Pei J, Wang W (2010) Threshold-based probabilistic top-k dominating queries. VLDB J 19(2):283–305
Zipf G (1949) Human behaviour and the principle of least-effort. Addison-Wesley, Cambridge
Acknowledgments
We thank anonymous reviewers for their very useful comments and suggestions. This work was supported in part by the National Basic Research (973) Program of China under Grant No. 2012CB316200, the National Natural Science Foundation of China under Grant Nos. 61190115, 61173022, 61033015, 60831160525, 61272046, and 60903016, Shandong Provincial Natural Science Foundation under Grant No. ZR2013FQ028, Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant Nos. HIT.NSRIF.2014136 and HIT(WH)201308, and National Science & Technology Pillar Program under Grant Nos. 2012BAA13B01, 2012BAH10F03, 2013BAH17F00.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Han, X., Li, J. & Gao, H. TDEP: efficiently processing top-k dominating query on massive data. Knowl Inf Syst 43, 689–718 (2015). https://doi.org/10.1007/s10115-013-0728-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0728-5