Abstract
Multimedia networks hold the promise of facilitating large-scale, real-time data processing in complex environments. Their foreseeable applications will help protect and monitor military, environmental, safety-critical, or domestic infrastructures and resources. Cloud infrastructures promise to provide high performance and cost effective solutions to large scale data processing problems. This paper focused on the outlier detection over distributed data stream in real time, proposed kernel density estimation (KDE) based outlier detection algorithm KDEDisStrOut in Storm, firstly formalized the problem of outlier detection using the kernel density estimation technique and update the transported data incrementally between the child node and the coordinator node which reduces the communication cost. Then the paper adopted the exponential decay policy to keep pace with the transient and evolving natures of stream data and changed the weight of different data in the sliding window adaptively made the data analysis more reasonable. Theoretical analysis and experiments on Storm with synthetic and real data show that the KDEDisStrOut algorithm is efficient and effective compared with existing outlier detection algorithms, and more suitable for data streams.
Similar content being viewed by others
References
Aggarwal CC, Han J-w, Wang J-y et al (2004) A frame-work for projected clustering of high dimensional data streams.// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, pp 852–863
Armbrust M, Fox A, Gri th R, Joseph A, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I et al (2009) Abovethe clouds: A berkeley view of cloud computing. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28
Assent I et al (2012) Anyout: anytime outlier detection on streaming data. Database Systems for Advanced Applications. Springer, Berlin
Bifet A, Holmes G, Kirkby R, Pfahringer B (2011) Data stream mining: a practical approach. The University of Waikato, Hamilton
Botev ZI, Grotowski JF, Kroese DP (2010) Kernel density estimation via diffusion[J]. Ann Stat 38(5):2916–2957
Branch JW, Giannella C, Szymanski B et al (2013) In-network outlier detection in wireless sensor networks. Knowl Inf Syst 34(1):23–25
Buchman SM, Lee AB, Schafer CM (2011) High-dimensional density estimation via SCA: an example in the modelling of hurricane tracks. Stat Methodol 8(1):18–30
Buzzi-Ferraris G, Manenti F (2011) Outlier detection in large data sets. Comput Chem Eng 35:388–390
Chen S, He H (2011) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol Syst 2(1):35–50
Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA. ACM Press, 133–142
Cheon JJ, Choe T-Y (2013) Distributed processing of snort alert log using Hadoop. Int J Eng Technol 5(3):2685–2690
Crisan D, Mguez J (2014) Particle-kernel estimation of the lter density in statespace models. Bernoulli 20(4):1879–1929. doi:10.3150/13-BEJ545
Fernandez RC, Weidlich M, Pietzuch P et al (2014) Scalable stateful stream processing for smart grids[C]//Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems. ACM, pp 276–281
Fingar P (2010) Dot Cloud: the 21st century business platform built on Cloud computing. Electronic Industry Press, Beijing
Francia GA, Hutchinson FS (2014) Regulatory and policy compliance with regard to identity theft prevention, detection, and response. In: Crisis management: concepts, methodologies, tools, and applications. Information Science Reference, Hershey. doi:10.4018/978-1-4666-4707-7.ch012, pp 280–310
Gabel M, Keren D, Schuster A (2013) Communication-efficient Outlier Detection for Scale-out Systems. BD3@ VLDB
Hatem, SS, El-Khouly MM (2014) Malware detection in Cloud computing. Int J Adv Comput Sci Appl 5(4)
Jia B, Liu S, Yang Y (2014) Fractal cross-layer service with integration and interaction in Internet of things. Int J Distrib Sensor Netw. doi: 10.1155/2014/760248
Juve G, Deelman E (2010) Scientific workflows and clouds. Crossroads 16(3):14–18
Kleiminger W (2011) Stream processing in the Cloud (R). MEng Honours degree in Computing of Imperial College
Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Gupta A, Shmueli O, Widom J (eds) Proc. of the 24th Int’l conf. on very large databases. ACM Press, New York, pp 392–403
Legg PA, Rosin PL, Marshall D et al (2013) Improving accuracy and efficiency of mutual information for multi-modal retinal image registration using adaptive probability density estimation. Comput Med Imaging Graph 37(7):597–606
Liu S, Fu W, Deng H et al (2013) Distributional fractal creating algorithm in parallel environment. Int J Distrib Sensor Netw. doi:10.1155/2013/281707
Liu S, Fu W, He L et al (2015) Distribution of primary additional errors in fractal encoding method [J]. Multimed Tools Appl. doi:10.1007/s11042-014-2408-1
Liu Z, Zhang H, Meng J et al (2013) WDE based outlinter detection on distributed data stream. Comput Eng 39(2):178–181
Massaro F, D’Abrusco R, Paggi A et al (2013) Unveiling the nature of the unidentified Gamma-Ray Sources. V. Analysis of the radio candidates with the kernel density estimation. Astrophys J Suppl Ser 209:1–10
Milenkoski A, Kounev S (2012) Towards benchmarking intrusion detection systems for virtualized cloud environments. ICITST
Papadimitirou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) LOCI: fast outlier detection using the local correlation integral. In: Dayal U, Ramamritham K, Vijayaraman TM (eds) Proc. of the 19th Int’l Conf. on Data Engineering. Bangalore. 315–326
Peng L (2011) Cloud computing. Electronic Industry Press, Beijing
Pöthkow K, Hege H-C (2013) Nonparametric models for uncertainty visualization.//Computer Graphics Forum. Blackwell Publishing Ltd, 32(3pt2): 131–140
Saini A, Sharma KK, Dalal S (2014) A survey on outlier detection in WSN. Int J Res Aspects Eng Manage 1(2):69–72
Scott DW (2010) Scott’s rule. Wiley Interdiscip Rev Comput Stat 2(4):497–502
Vakali A, Giatsoglou M, Antaris S (2012) Social networking trends and dynamics detection via a cloud-based framework design. Proceedings of the 21st international conference companion on World Wide Web. ACM
Verde R, Irpino A, Rivoli L (2014) A box-plot and outliers detection proposal for histogram data: new tools for data stream analysis. Analysis and Modeling of Complex Data in Behavioral and Social Sciences Studies in Classification, Data Analysis, and Knowledge Organization, pp 283–291
Watson P, Lord P, Gibson F, Periorellis P, Pitsilis G (2008) Cloud computing for e-Science with CARMEN. In: 2nd Iberian Grid Infrastructure Conference Proceedings, pp 3–14. Netbiblo
Yang F et al (2012) Sonora: a platform for continuous mobile-cloud computing. Technical report, Technical Report. Microsoft Research Asia, pp 1–17
Yu D, Ping L, Li W (2014) Spatio-temporal outlier detection based on cloud computing. J Comput Inf Syst 10(13):5481–5488
Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutorials 12(2):159–170
Acknowledgments
This work is supported by the Key Projects in the National Science & Technology Pillar Program during the Twelfth Five-year Plan Period under Grant No.2015BAK07B03, National “Twelfth Five-Year” Plan for Science & Technology Support under Grant No.2013BAH18F02.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
This work is done when Zhigao Zheng was in National Engineering Research Center for E-learning of Central China Normal University.
Rights and permissions
About this article
Cite this article
Zheng, Z., Jeong, HY., Huang, T. et al. KDE based outlier detection on distributed data streams in multimedia network. Multimed Tools Appl 76, 18027–18045 (2017). https://doi.org/10.1007/s11042-016-3681-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3681-y