[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

TDEP: efficiently processing top-k dominating query on massive data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In many applications including multicriteria decision making, top-k dominating query is a practically useful tool to return k tuples with the highest domination scores in a potentially huge data space. The existing algorithms, either requiring indexes built on the specific attribute subset, or incurring high I/O cost or memory cost, cannot process top-k dominating query on massive data efficiently. In this paper, a novel algorithm TDEP is proposed to utilize sorted lists built for each attribute with low cost to return top-k dominating result on massive data efficiently. Through analysis, it is found that TDEP can be divided into two phases: growing phase and shrinking phase. In each phase, TDEP retrieves the sorted lists in round-robin fashion and maintains the candidates until the stop condition is satisfied. The theoretical analysis is provided for the execution behavior in two phases. An efficient method is developed to compute the domination scores of tuples with the obtained candidates only. Besides, TDEP adopts early pruning to reduce the number of candidate tuples maintained significantly. The extensive experimental results, conducted on synthetic and real-life data sets, show the significant performance advantage of TDEP over the existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. In this paper, we consider that the tuples are in general position [24].

  2. In the following section, we will see that the practical probability is even much higher.

  3. http://basketballreference.com/.

References

  1. Akbarinia R, Pacitti E, Valduriez P (2007) Best position algorithms for top-k queries. In: Proceedings of the 33rd international conference on very large data bases, pp 495–506

  2. Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426

    Article  MATH  Google Scholar 

  3. Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430

  4. Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data engineering, pp 717–719

  5. Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 102–113

  6. Fagin R, Lotem A, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656

    Article  MATH  MathSciNet  Google Scholar 

  7. Fan H, Zaïane O, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1):31–51

    Article  Google Scholar 

  8. Feller W (1968) An introduction to probability theory, vol 1, 3rd edn. Wiley, New York

    MATH  Google Scholar 

  9. Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28

    Article  Google Scholar 

  10. Güntzer U, Balke W, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th international conference on very large data bases, pp 419–428

  11. Güntzer U, Balke W, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the international conference on information technology: coding and computing, pp 622–628

  12. Han X, Li J, Yang D (2011) Supporting early pruning in top-k query processing on massive data. Inf Process Lett 111(11):524–532

    Article  MATH  MathSciNet  Google Scholar 

  13. Han X, Li J, Wang J, Yang D (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535

    Article  Google Scholar 

  14. Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384

    Article  Google Scholar 

  15. Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183

    Article  Google Scholar 

  16. Ilyas I, Beskales G, Soliman M (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11

    Article  Google Scholar 

  17. Kontaki M, Papadopoulos A, Manolopoulos Y (2012) Continuous top-k dominating queries. IEEE Trans Knowl Data Eng 24(5):840–853

    Article  Google Scholar 

  18. Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data bases, pp 275–286

  19. Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 660–671

  20. Mamoulis N, Yiu M, Cheng K, Cheung D (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3):19

    Article  Google Scholar 

  21. Pang H, Ding X, Zheng B (2010) Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J 19(3):437–456

    Article  Google Scholar 

  22. Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82

    Article  Google Scholar 

  23. Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86

    Article  Google Scholar 

  24. Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116

  25. Skoutas D, Sacharidis D, Simitsis A, Kantere V, Sellis T (2009) Top-k dominant web services under multi-criteria matching. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 898–909

  26. Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606

    Article  Google Scholar 

  27. Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data bases, pp 301–310

  28. Tiakas E, Papadopoulos A, Manolopoulos Y (2011) Progressive processing of subspace dominating queries. VLDB J 20(6):921–948

    Article  Google Scholar 

  29. Yang B, Huang H (2010) Topsil-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225–242

    Article  Google Scholar 

  30. Yiu M, Mamoulis N (2007) Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd international conference on very large data, bases, pp 483–494

  31. Yiu M, Mamoulis N (2009) Multi-dimensional top-k dominating queries. VLDB J 18(3):695–718

    Article  Google Scholar 

  32. Zhang W, Lin X, Zhang Y, Pei J, Wang W (2010) Threshold-based probabilistic top-k dominating queries. VLDB J 19(2):283–305

    Article  Google Scholar 

  33. Zipf G (1949) Human behaviour and the principle of least-effort. Addison-Wesley, Cambridge

    Google Scholar 

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work was supported in part by the National Basic Research (973) Program of China under Grant No. 2012CB316200, the National Natural Science Foundation of China under Grant Nos. 61190115, 61173022, 61033015, 60831160525, 61272046, and 60903016, Shandong Provincial Natural Science Foundation under Grant No. ZR2013FQ028, Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant Nos. HIT.NSRIF.2014136 and HIT(WH)201308, and National Science & Technology Pillar Program under Grant Nos. 2012BAA13B01, 2012BAH10F03, 2013BAH17F00.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xixian Han.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, X., Li, J. & Gao, H. TDEP: efficiently processing top-k dominating query on massive data. Knowl Inf Syst 43, 689–718 (2015). https://doi.org/10.1007/s10115-013-0728-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0728-5

Keywords

Navigation