[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Hierarchisches gruppenbasiertes Sampling

  • Original Article
  • Published:
Informatik - Forschung und Entwicklung

Abstract

In Zeiten wachsender Datenbankgrößen ist es unumgänglich, Anfragen näherungsweise auszuwerten um schnelle Antworten zu erhalten. Dieser Artikel stellt verschiedene Methoden vor, dieses Ziel zu erreichen, und wendet sich anschließend dem Sampling zu, welches mit Hilfe einer Stichprobe schnell zu adäquaten Ergebnissen führt. Enthalten Datenbankanfragen Verbund- oder Gruppierungsoperationen, so sinkt die Genauigkeit vieler Sampling-Verfahren sehr stark; insbesondere werden vor allem kleine Gruppen nicht erkannt. Dieser Artikel befasst sich mit hierarchischen gruppenbasiertem Sampling, welches Sampling, Gruppierung und Verbundoperationen kombiniert.

Abstract

In times of increasing database sizes it is crucial to process queries approximately in order to obtain answers quickly. This article introduces several methods for achieving this goal and afterwards focuses on sampling, yielding appropriate results by using only a subset of the actual data. If database queries contain join or group-by operations, the accuracy of many sampling methods drops significantly; especially small groups are not recognized. This article is concerned with hierarchical group-based sampling, which combines sampling, grouping and joins.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

Literatur

  1. Acharya S, Gibbons PB, Poosala V (2000) Congressional Samples for Approximate Answering of Group-By Queries. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp 487–498

  2. Acharya S, Gibbons PB, Poosala V, Ramaswamy S (1999) Join synopses for approximate query answering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp 275–286

  3. Babcock B, Chaudhuri S, Das G (2003) Dynamic sample selection for approximate query processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp 539–550

  4. Barbará D, DuMouchel W, Faloutsos C, Haas PJ, Hellerstein JM, Ioannidis YE, Jagadish HV, Johnson T, Ng RT, Poosala V, Ross KA, Sevcik KC (1997) The New Jersey Data Reduction Report. IEEE Data Eng Bull 20(4):3–45

  5. Chakrabarti K, Garofalakis MN, Rastogi R, Shim K (2000) Approximate Query Processing Using Wavelets. In: Proceedings of 26th International Conference on Very Large Data Bases, VLDB 2000, pp 111–122

  6. Chaudhuri S, Das G, Datar M, Motwani R, Narasayya VR (2001) Overcoming Limitations of Sampling for Aggregation Queries. In: 17th International Conference on Data Engineering (ICDE’ 01), pp 534–544, April

  7. Chaudhuri S, Das G, Srivastava U (2004) Effective use of block-level sampling in statistics estimation. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM Press, pp 287–298

  8. Chaudhuri S, Motwani R, Narasayya V (1999) On Random Sampling over Joins. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp 263–274

  9. Ganti V, Lee M-L, Ramakrishnan R (2000) ICICLES: Self-Tuning Samples for Approximate Query Answering. In: The VLDB Journal, pp 176–187

  10. Gibbons PB, Matias Y (1998) New sampling-based summary statistics for improving approximate query answers. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data. ACM Press, pp 331–342

  11. Gryz J, Guo J, Liu L, Zuzarte C (2004) Query sampling in DB2 Universal Database. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM Press, pp 839–843

  12. Haas PJ, Hellerstein JM (1999) Ripple Joins for Online Aggregation. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp 287–298

  13. Haas PJ, König C (2004) A bi-level Bernoulli scheme for database sampling. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM Press, pp 275–286

  14. Hellerstein JM, Haas PJ, Wang HJ (1997) Online Aggregation. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp 171–182

  15. Ioannidis YE, Poosala V (1995) Histogram-Based Solutions to Diverse Database Estimation Problems. Data Engineering Bulletin 18(3):10–18

  16. Jermaine C (2003) Robust Estimation With Sampling and Approximate Pre-Aggregation. In: Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003, September 9–12, 2003, Berlin, Germany, Los Altos, CA 94022, USA, 2003. Morgan Kaufmann Publishers, pp 886–897

  17. Jermaine C, Pol A, Arumugam S (2004) Online maintenance of very large random samples. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM Press, pp 299–310

  18. Olken F (1993) Random Sampling from Databases. Thesis LBL-32883, Information and Computing Sciences Division, Lawrence Berkeley National Laboratory, Mailstop 50B-3238, 1 Cyclotron Road, Berkeley, California 94720, USA

  19. Poosala V, Ganti V, Ioannidis YE (1999) Approximate Query Answering using Histograms. IEEE Data Eng Bull 22(4):5–14

    Google Scholar 

  20. Transaction Processing Performace Council (1998) TPC-D Benchmark Version 2.1, February. http://www.tpc.org

  21. Vitter JS (1985) Random Sampling with a Reservoir. ACM Transactions on Mathematical Software 11(1):37–57, March

    Google Scholar 

  22. Vitter JS, Wang M (1999) Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of data. ACM Press, pp 193–204

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Lehner.

Additional information

CR Subject Classification

H.2.4,H.4.2,H.3.3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gemulla, R., Berthold, H. & Lehner, W. Hierarchisches gruppenbasiertes Sampling. Informatik Forsch. Entw. 20, 45–56 (2005). https://doi.org/10.1007/s00450-004-0175-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-004-0175-3

Keywords

Navigation