Abstract
Data warehouses are increasingly supplied with data produced by a large number of distributed sensors in many applications: medicine, military, road traffic, weather forecast, utilities like electric power suppliers etc. Such data is widely distributed and produced continuously as data streams. The rate at which data is collected at each sensor node affects the communication resources, the bandwidth and/or the computational load at the central server. In this paper, we propose a generic tool for summarizing distributed data streams where the amount of data being collected from each sensor adapts to data characteristics. Experiments done on electric power consumption real data are reported and show the efficiency of the proposed approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: 20th International Conference on Data Engineering (2004)
Babcock, B., Datar, M., Motwani, R.: Sampling From a Moving Window Over Streaming Data. In: 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2002), pp. 633–634 (2002)
Bellman, R.: On the approximation of curves by line segments using dynamic programming. Communications of the ACM 4(6), 284 (1961)
Carney, D., Cetinternel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams - A New Class of Data Management Applications. In: Proc. Int. Conf. on Very Large Data Bases, pp. 215–226 (2002)
Chu, D., Deshpande, A., Hellerstein, J.M., Hong, W.: Approximate Data Collection in Sensor Networks using Probabilistic Models. In: Proceedings of the 22nd international Conference on Data Engineering. ICDE 2006 (2006)
Cormode, G., Garofalakis, M.: Approximate Continuous Querying over Distributed Streams. ACM Transactions on Database Systems 33(2) (June 2008)
Demers, A., Gehrke, J., Panda, B., Riedewald, M., Sharma, V., White, W.M.: Cayuga: A general purpose event monitoring system. In: CIDR, pp. 412–422 (2007)
Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: Proceedings of the 2002 SIGMOD Conference, pp. 61–72 (2002)
Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: Proceedings of the 2001 SIGMOD Conference, pp. 13–24 (2001)
Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: Proceedings of the 2001 STOC Conference, pp. 471–475 (2001)
Jain, A., Chang, E.Y.: Adaptive sampling for sensor networks. In: Proceeedings of the 1st international Workshop on Data Management For Sensor Networks: in Conjunction with VLDB 2004, Toronto, Canada (2004)
Keogh, E., Chu, S., Hart, D., Pazzani, M.: An Online Algorithm for Segmenting Time Series. In: Proceedings of IEEE International Conference on Data Mining, pp. 289–296 (2001)
Liu, C., Wu, K., Tsao, M.: Energy Efficient Information Collection with the ARIMA model in Wireless Sensor Networks. In: Proceedings of IEEE GlobeCom 2005, St. Louis, Missouri (November 2005)
Marbini, A.D., Sacks, L.E.: Adaptive sampling mechanisms in sensor networks. In: London Communications Symposium, London, UK (2003)
Palpanas, T., Vlachos, M., Keogh, E., Gunopulos, D.: Streaming Time Series Summarization Using User-Defined Amnesic Functions. IEEE Transactions on Knowledge and Data Engineering (2008)
Sakurai, Y., Papadimitriou, S., Faloutsos, C.: Automatic Discovery of Lag Correlations in Stream Data. In: ICDE 2005, Tokyo, Japan (2005)
Stanford Stream Data Management (STREAM) Project, http://www-db.stanford.edu/stream
StreamBase Systems, Inc., http://www.streambase.com/
Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Proc. of the 29th Intl. Conf. on Very Large Databases (VLDB 2003) (2003)
Willett, R., Martin, A., Nowak, R.: Backcasting: adaptive sampling for sensor networks. In: Proceedings of the Third international Symposium on information Processing in Sensor Networks, Berkeley, California (2004)
Zdonik, S., Stonebraker, M., Cherniack, M., Cetintemel, U., Balazinska, M., Balakrishnan, H.: The Aurora and Medusa Projects. In: Bulletin of the Technical Committee on Data Engineering, March 2003, pp. 3–10. IEEE Computer Society, Los Alamitos (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chiky, R., Hébrail, G. (2008). Summarizing Distributed Data Streams for Storage in Data Warehouses. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-85836-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85835-5
Online ISBN: 978-3-540-85836-2
eBook Packages: Computer ScienceComputer Science (R0)