[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

On Futuristic Query Processing in Data Streams

  • Conference paper
Advances in Database Technology - EDBT 2006 (EDBT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3896))

Included in the following conference series:

Abstract

Recent advances in hardware technology have resulted in the ability to collect and process large amounts of data. In many cases, the collection of the data is a continuous process over time. Such continuous collections of data are referred to as data streams. One of the interesting problems in data stream mining is that of predictive query processing. This is useful for a variety of data mining applications which require us to estimate the future behavior of the data stream. In this paper, we will discuss the problem from the point of view of predictive summarization. In predictive summarization, we would like to store statistical characteristics of the data stream which are useful for estimation of queries representing the behavior of the stream in the future. The example utilized for this paper is the case of selectivity estimation of range queries. For this purpose, we propose a technique which utilizes a local predictive approach in conjunction with a careful choice of storing and summarizing particular statistical characteristics of the data. We use this summarization technique to estimate the future selectivity of range queries, though the results can be utilized to estimate a variety of futuristic queries. We test the results on a variety of data sets and illustrate the effectiveness of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Aggarwal, C.C.: A Framework for Diagnosing Changes in Evolving Data Streams. In: ACM SIGMOD Conference, pp. 575–586 (2003)

    Google Scholar 

  2. Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A Framework for Clustering Evolving Data Streams. In: VLDB Conference, pp. 81–92 (2003)

    Google Scholar 

  3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: ACM PODS Conference, pp. 1–16 (2002)

    Google Scholar 

  4. Chen, Y., Dong, G., Han, J., Wah, B., Wang, J.: Multi-Dimensional Regression Analysis of Time Series Data Streams. In: VLDB Conference, pp. 323–334 (2002)

    Google Scholar 

  5. Cortes, C., Fisher, K., Pregibon, D., Rogers, A., Smith, F.: Hancock: A Language for Extracting Signatures from Data Streams. In: ACM KDD Conference, pp. 9–17 (2000)

    Google Scholar 

  6. Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing Complex Aggregate Queries over Data Streams. In: ACM SIGMOD Conference, pp. 61–72 (2002)

    Google Scholar 

  7. Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Sketch-based multi-query processing over data streams. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 551–568. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: ACM KDD Conference, pp. 71–80 (2000)

    Google Scholar 

  9. Farnstrom, F., Lewis, J., Elkan, C.: Scalability for Clustering Algorithms Revisited. ACM SIGKDD Explorations 2(1), 51–57 (2000)

    Article  Google Scholar 

  10. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. In: VLDB Conference, pp. 79–88 (2001)

    Google Scholar 

  11. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: How to Summarize the Universe: Dynamic Maintenance of Quantiles. In: VLDB Conference, pp. 454–465 (2002)

    Google Scholar 

  12. Gunopulos, D., Kollios, G., Tsotras, V., Domeniconi, C.: Approximating Multi- Dimensional Aggregate Range Queries over Real Attributes. In: ACM SIGMOD Conference, pp. 463–474 (2000)

    Google Scholar 

  13. Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: VLDB Conference, pp. 346–357 (2002)

    Google Scholar 

  14. O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-Data Algorithms For High-Quality Clustering. In: IEEE ICDE Conference, pp. 685–696 (2002)

    Google Scholar 

  15. Vitter, J., Wang, M.: Approximate Computation of Multidimensional Aggregates of Sparse Data using Wavelets. In: ACM SIGMOD Conference, pp. 193–204 (1999)

    Google Scholar 

  16. http://www.ics.uci.edu/~mlearn

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aggarwal, C.C. (2006). On Futuristic Query Processing in Data Streams. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_6

Download citation

  • DOI: https://doi.org/10.1007/11687238_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32960-2

  • Online ISBN: 978-3-540-32961-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics