Abstract
Existing methods for clustering uncertain data streams over sliding windows do not treat the categorical attributes. However, uncertain mixed data are ubiquitous. This paper investigates the problem of clustering heterogeneous data streams pervaded by uncertainty over sliding windows, so-called SWHU-Clustering. A Heterogeneous Uncertain Temporal Cluster Feature (HUTCF) is introduced to monitor the distribution statistics of mixed data points. Based on this structure, Exponential Histogram of Heterogeneous Uncertain Cluster Feature (EHHUCF) is presented as a collection of HUTCF. This structure may help to handle the in-cluster evolution, and detects the temporal change of the cluster distribution. Our approach has several advantages over existing method: 1) the higher execution efficiency benefits from its good design as it avoids the effects of old data on the final results. 2) We incorporated the k-NN into the clustering process in order to reduce the complexity of the algorithm. 3) Memory consumption can be managed efficiently by limiting the number of HUTCF in each EHHUCF. Simulations on real databases show the feasibility of SWHU-Clustering as well as its effectiveness by comparing it with UMicro algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: A scalable continuous query system for internet databases. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, pp. 379–390. ACM (2000)
Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data streams in real time. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 358–369. VLDB Endowment (2002)
Bonnet, P., Gehrke, J., Seshadri, P.: Towards sensor database systems. In: Tan, K.-L., Franklin, M.J., Lui, J.C.-S. (eds.) MDM 2001. LNCS, vol. 1987, pp. 3–14. Springer, Heidelberg (2000)
Online, M.W.: Merriam-webster online dictionary (2009)
Considine, J., Li, F., Kollios, G., Byers, J.: Approximate aggregation techniques for sensor databases. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, pp. 449–460. IEEE Computer Society, Washington, DC (2004)
Zhang, C., Gao, M., Zhou, A.: Tracking high quality clusters over uncertain data streams. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, pp. 1641–1648. IEEE Computer Society (2009)
Aggarwal, C.C., Yu, P.S.: A framework for clustering uncertain data streams. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 150–159. IEEE Computer Society (2008)
Guoyan, H., Dapeng, L., Jiadong, R., Changzhen, H.: An algorithm for clustering uncertain data streams over sliding windows. In: 2010 6th International Conference on Digital Content, Multimedia Technology and its Applications (IDC), pp. 173–177. IEEE Computer Society (2010)
Huang, G.Y., Liang, D.P., Hu, C.Z., Ren, J.D.: An algorithm for clustering heterogeneous data streams with uncertainty. In: Proceedings of the International Conference on Machine Learning and Cybernetics, ICMLC 2010, Qingdao, China, July 11-14, pp. 2059–2064. IEEE (2010)
Serir, L., Ramasso, E., Zerhouni, N.: Evidential evolving gustafson kessel algorithm for online data streams partitioning using belief function theory. Int. J. Approx. Reasoning 53, 747–768 (2012)
Dempster, A.P.: Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics 38, 325–339 (1967)
Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowl. Inf. Syst. 15, 181–214 (2008)
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 81–92. VLDB Endowment (2003)
Liu, W., OuYang, J.: Clustering algorithm for high dimensional data stream over sliding windows. In: Proceedings of the 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, pp. 1537–1542. IEEE Computer Society (2011)
Murphy, P., Aha, D.: Uci repository databases (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hentech, H., Gouider, M.S., Farhat, A. (2013). Clustering Heterogeneous Data Streams with Uncertainty over Sliding Window. In: Cuzzocrea, A., Maabout, S. (eds) Model and Data Engineering. MEDI 2013. Lecture Notes in Computer Science, vol 8216. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41366-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-41366-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41365-0
Online ISBN: 978-3-642-41366-7
eBook Packages: Computer ScienceComputer Science (R0)