Abstract
Recent advances in hardware technology have resulted in the ability to collect and process large amounts of data. In many cases, the collection of the data is a continuous process over time. Such continuous collections of data are referred to as data streams. One of the interesting problems in data stream mining is that of predictive query processing. This is useful for a variety of data mining applications which require us to estimate the future behavior of the data stream. In this paper, we will discuss the problem from the point of view of predictive summarization. In predictive summarization, we would like to store statistical characteristics of the data stream which are useful for estimation of queries representing the behavior of the stream in the future. The example utilized for this paper is the case of selectivity estimation of range queries. For this purpose, we propose a technique which utilizes a local predictive approach in conjunction with a careful choice of storing and summarizing particular statistical characteristics of the data. We use this summarization technique to estimate the future selectivity of range queries, though the results can be utilized to estimate a variety of futuristic queries. We test the results on a variety of data sets and illustrate the effectiveness of the approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.C.: A Framework for Diagnosing Changes in Evolving Data Streams. In: ACM SIGMOD Conference, pp. 575–586 (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A Framework for Clustering Evolving Data Streams. In: VLDB Conference, pp. 81–92 (2003)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: ACM PODS Conference, pp. 1–16 (2002)
Chen, Y., Dong, G., Han, J., Wah, B., Wang, J.: Multi-Dimensional Regression Analysis of Time Series Data Streams. In: VLDB Conference, pp. 323–334 (2002)
Cortes, C., Fisher, K., Pregibon, D., Rogers, A., Smith, F.: Hancock: A Language for Extracting Signatures from Data Streams. In: ACM KDD Conference, pp. 9–17 (2000)
Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing Complex Aggregate Queries over Data Streams. In: ACM SIGMOD Conference, pp. 61–72 (2002)
Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Sketch-based multi-query processing over data streams. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 551–568. Springer, Heidelberg (2004)
Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: ACM KDD Conference, pp. 71–80 (2000)
Farnstrom, F., Lewis, J., Elkan, C.: Scalability for Clustering Algorithms Revisited. ACM SIGKDD Explorations 2(1), 51–57 (2000)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. In: VLDB Conference, pp. 79–88 (2001)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: How to Summarize the Universe: Dynamic Maintenance of Quantiles. In: VLDB Conference, pp. 454–465 (2002)
Gunopulos, D., Kollios, G., Tsotras, V., Domeniconi, C.: Approximating Multi- Dimensional Aggregate Range Queries over Real Attributes. In: ACM SIGMOD Conference, pp. 463–474 (2000)
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: VLDB Conference, pp. 346–357 (2002)
O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-Data Algorithms For High-Quality Clustering. In: IEEE ICDE Conference, pp. 685–696 (2002)
Vitter, J., Wang, M.: Approximate Computation of Multidimensional Aggregates of Sparse Data using Wavelets. In: ACM SIGMOD Conference, pp. 193–204 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aggarwal, C.C. (2006). On Futuristic Query Processing in Data Streams. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_6
Download citation
DOI: https://doi.org/10.1007/11687238_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32960-2
Online ISBN: 978-3-540-32961-9
eBook Packages: Computer ScienceComputer Science (R0)