[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
survey

Outlier Detection: Methods, Models, and Classification

Published: 12 June 2020 Publication History

Abstract

Over the past decade, we have witnessed an enormous amount of research effort dedicated to the design of efficient outlier detection techniques while taking into consideration efficiency, accuracy, high-dimensional data, and distributed environments, among other factors. In this article, we present and examine these characteristics, current solutions, as well as open challenges and future research directions in identifying new outlier detection strategies. We propose a taxonomy of the recently designed outlier detection strategies while underlying their fundamental characteristics and properties. We also introduce several newly trending outlier detection methods designed for high-dimensional data, data streams, big data, and minimally labeled data. Last, we review their advantages and limitations and then discuss future and new challenging issues.

References

[1]
Dit-Yan Yeung and Calvin Chow. 2002. Parzen-window network intrusion detectors. In Object Recognition Supported by User Interaction for Service Robots, Vol. 4. IEEE, 385--388.
[2]
Robert Gwadera, Mikhail J. Atallah, and Wojciech Szpankowski. 2005. Reliable detection of episodes in event sequences. Knowl. Inf. Syst. 7, 4 (2005), 415--437.
[3]
Mikhail Atallah, Wojciech Szpankowski, and Robert Gwadera. 2004. Detection of significant sets of episodes in event sequences. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04). IEEE, 3--10.
[4]
Pedro Garcia-Teodoro, Jesus Diaz-Verdejo, Gabriel Maciá-Fernández, and Enrique Vázquez. 2009. Anomaly-based network intrusion detection: Techniques, systems and challenges. Comput. Secur. 28, 1 2 (2009), 18--28.
[5]
Richard J. Bolton and David J. Hand. 2001. Unsupervised profiling methods for fraud detection. In Proceedings of Credit Scoring and Credit Control VII. 5--7.
[6]
Sutapat Thiprungsri, Miklos A. Vasarhelyi, et al. 2011. Cluster analysis for anomaly detection in accounting data: An audit approach. Int. J. Dig. Account. Res. 11 (2011), 69--84.
[7]
Clifton Phua, Damminda Alahakoon, and Vincent Lee. 2004. Minority report in fraud detection: Classification of skewed data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 50--59.
[8]
Weng-Keen Wong, Andrew W. Moore, Gregory F. Cooper, and Michael M. Wagner. 2003. Bayesian network anomaly pattern detection for disease outbreaks. In Proceedings of the 20th International Conference on Machine Learning. 808--815.
[9]
Jessica Lin, Eamonn Keogh, Ada Fu, and Helga Van Herle. 2005. Approximations to magic: Finding unusual medical time series. In Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems. Citeseer, 329--334.
[10]
Ryohei Fujimaki, Takehisa Yairi, and Kazuo Machida. 2005. An approach to spacecraft anomaly detection problem using kernel feature space. In Proceedings of the 11th ACM International Conference on Knowledge Discovery in Data Mining. 401--410.
[11]
Vincent Vercruyssen, Wannes Meert, Gust Verbruggen, Koen Maes, Ruben Bäumer, and Jesse Davis. 2018. Semi-supervised anomaly detection with an application to water analytics. In Proceedings of the IEEE International Conference on Data Mining.
[12]
Yu-Lin Tsou, Hong-Min Chu, Cong Li, and Shao-Wen Yang. 2018. Robust distributed anomaly detection using optimal weighted one-class random forests. In Proceedings of the 2018 IEEE International Conference on Data Mining. 1272--1277.
[13]
Youcef Djenouri, Asma Belhadi, Jerry Chun-Wei Lin, Djamel Djenouri, and Alberto Cano. 2019. A survey on urban traffic anomalies detection algorithms. IEEE Access 7 (2019), 12192--12205.
[14]
Varun Chandola et al. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009), 15.
[15]
Hongzhi Wang et al. 2019. Progress in outlier detection techniques: A survey. IEEE Access 7 (2019), 107964--108000.
[16]
Guansong Pang, Kai Ming Ting, and David Albrecht. 2015. LeSiNN: Detecting anomalies by identifying least similar nearest neighbours. In Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW’15). IEEE, 623--630.
[17]
Haizhou Du, Shengjie Zhao, Daqiang Zhang, and Jinsong Wu. 2016. Novel clustering-based approach for local outlier detection. In Proceedings of the 2016 IEEE Conference on Computer Communications Workshops. 802--811.
[18]
Chong Zhou and Randy C. Paffenroth. 2017. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM International Conference on Knowledge Discovery and Data Mining. 665--674.
[19]
Frank E. Grubbs. 1969. Procedures for detecting outlying observations in samples. Technometrics 11, 1 (1969), 1--21.
[20]
V. Barnett and T. Lewis. 1994. Outliers in Statistical Data (Probability 8 Mathematical Statistics). (1994).
[21]
Markus Goldstein and Seiichi Uchida. 2016. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS ONE 11, 4 (2016), e0152173.
[22]
Charu C. Aggarwal. 2015. Outlier analysis. In Data Mining. Springer, 237--263.
[23]
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In ACM Sigmod Record, Vol. 29. ACM, 93--104.
[24]
Ji Zhang. 2013. Advancements of outlier detection: A survey. ICST Trans. Scal. Inf. Syst. 13, 1 (2013), 1--26.
[25]
Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: A survey. Data Min. Knowl. Discov. 29, 3 (2015), 626--688.
[26]
Chesner Désir, Simon Bernard, Caroline Petitjean, and Laurent Heutte. 2013. One class random forests. Pattern Recogn. 46, 12 (2013), 3490--3506.
[27]
Shubhomoy Das, Weng-Keen Wong, Thomas Dietterich, Alan Fern, and Andrew Emmott. 2016. Incorporating expert feedback into active anomaly discovery. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining. 853--858.
[28]
Fei Liu et al. 2008. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining. 413--422.
[29]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Annual Symposium on Computational Geometry. ACM, 253--262.
[30]
Guy M. Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. IBM Germany Scientific Symposium Series (1966).
[31]
Edwin M. Knorr and Raymond T. Ng. 1998. Algorithms for mining distance-based outliers in large datasets. In VLDB, Vol. 98. Citeseer, 392--403.
[32]
Victoria Hodge et al. 2004. A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 2 (2004), 85--126.
[33]
Prasanta Gogoi, D. K. Bhattacharyya, Bhogeswar Borah, and Jugal K. Kalita. 2011. A survey of outlier detection methods in network anomaly identification. Comput. J. 54, 4 (2011), 570--588.
[34]
Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel. 2012. A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. 5, 5 (2012), 363--387.
[35]
Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2013. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 26, 9 (2013), 2250--2267.
[36]
Jian Tang, Zhixiang Chen, Ada Wai-Chee Fu, and David W. Cheung. 2002. Enhancing effectiveness of outlier detections for low density patterns. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 535--548.
[37]
Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. 2003. Loci: Fast outlier detection using the local correlation integral. In Proceedings 19th International Conference on Data Engineering 2003. IEEE, 315--326.
[38]
Wen Jin, Anthony K. H. Tung, Jiawei Han, and Wei Wang. 2006. Ranking outliers using symmetric neighborhood relationship. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 577--593.
[39]
Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2009. LoOP: Local outlier probabilities. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. 1649--1652.
[40]
Tharindu Bandaragoda. 2014. Efficient anomaly detection by isolation using nearest neighbour ensemble. In Proceedings of the 2014 IEEE International Conference on Data Mining Workshop. 698--705.
[41]
Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. 2000. Efficient algorithms for mining outliers from large data sets. In ACM Sigmod Record, Vol. 29. ACM, 427--438.
[42]
Fabrizio Angiulli and Clara Pizzuti. 2002. Fast outlier detection in high dimensional spaces. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 15--27.
[43]
Kai Ming Ting, Takashi Washio, Jonathan R. Wells, and Sunil Aryal. 2017. Defying the gravity of learning curve: A characteristic of nearest neighbour anomaly detectors. Mach. Learn. 106, 1 (2017), 55--91.
[44]
Mon-Fong Jiang, Shian-Shyong Tseng, and Chih-Ming Su. 2001. Two-phase clustering process for outliers detection. Pattern Recogn. Lett. 22, 6--7 (2001), 691--700.
[45]
John A. Hartigan and Manchek A. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C 28, 1 (1979), 100--108.
[46]
Zengyou He, Xiaofei Xu, and Shengchun Deng. 2003. Discovering cluster-based local outliers. Pattern Recogn. Lett. 24, 9--10 (2003), 1641--1650.
[47]
Mennatallah Amer and Markus Goldstein. 2012. Nearest-neighbor and clustering based anomaly detection algorithms for rapidminer. In Proceedings of the 3rd RapidMiner Community Meeting and Conference (RCOMM’12). 1--12.
[48]
Alex Rodriguez et al. 2014. Clustering by fast search and find of density peaks. Science 344, 6191 (2014), 1492--1496.
[49]
Brett G. Amidan, Thomas A. Ferryman, and Scott K. Cooley. 2005. Data outlier detection using the Chebyshev theorem. In Proceedings of the 2005 IEEE Aerospace Conference. IEEE, 3814--3819.
[50]
Dimitris Achlioptas. 2001. Database-friendly random projections. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 274--281.
[51]
Timothy De Vries, Sanjay Chawla, and Michael E. Houle. 2010. Finding local anomalies in very high dimensional space. In Proceedings of the IEEE 10th International Conference on Data Mining (ICDM’10). IEEE, 128--137.
[52]
Ye Wang, Srinivasan Parthasarathy, and Shirish Tatikonda. 2011. Locality sensitive outlier detection: A ranking driven approach. In Proceedings of the IEEE 27th International Conference on Data Engineering (ICDE’11). IEEE, 410--421.
[53]
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing. ACM, 604--613.
[54]
Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel. 2015. Fast and scalable outlier detection with approximate nearest neighbor ensembles. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 19--36.
[55]
Tomáš Pevnỳ. 2016. Loda: Lightweight on-line detector of anomalies. Mach. Learn. 102, 2 (2016), 275--304.
[56]
S. Hariri, M. Carrasco Kind, and R. J. Brunner. 2018. Extended isolation forest. ArXiv e-prints (Nov. 2018). arxiv:1811.02141
[57]
Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press.
[58]
Antonin Guttman. 1984. R-trees: A Dynamic Index Structure for Spatial Searching. Vol. 14. ACM.
[59]
King-Ip Lin et al. 1994. The TV-tree: An index structure for high-dimensional data. VLDB J. 3, 4 (1994), 517--542.
[60]
Vladimir M. Zolotarev. 1986. One-dimensional Stable Distributions. Vol. 65. American Mathematical Soc.
[61]
Nguyen Hoang Vu and Vivekanand Gopalkrishnan. 2009. Efficient pruning schemes for distance-based outlier detection. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 160--175.
[62]
Jack A. Orenstein and Tim H. Merrett. 1984. A class of data structures for associative searching. In Proceedings of the 3rd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. ACM, 181--190.
[63]
Ting Li et al. 2016. A locality-aware similar information searching scheme. International J. Dig. Libr. 17, 2 (2016), 79--93.
[64]
Sampath Deegalla and Henrik Bostrom. 2006. Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. In Proceedings of the 5th IEEE International Conference on Machine Learning and Applications (ICMLA'06). IEEE, 245--250.
[65]
William Johnson et al. 1984. Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26, 189-206 (1984), 1.
[66]
George Kollios et al. 2003. Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng. 15, 5 (2003), 1170--1187.
[67]
Mingxi Wu and Christopher Jermaine. 2006. Outlier detection by sampling with accuracy guarantees. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 767--772.
[68]
Stephen D. Bay and Mark Schwabacher. 2003. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of 9th ACM International Conference on Knowledge Discovery and Data Mining. 29--38.
[69]
Wen Jin, Anthony K. H. Tung, and Jiawei Han. 2001. Mining top-n local outliers in large databases. In Proceedings of the Seventh ACM International Conference on Knowledge Discovery and Data Mining. ACM, 293--298.
[70]
Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1999. When is “nearest neighbor” meaningful? In Proceedings of the International Conference on Database Theory. Springer, 217--235.
[71]
Alexander Hinneburg, Charu C. Aggarwal, and Daniel A. Keim. 2000. What is the nearest neighbor in high dimensional spaces? In Proceedings of the 26th International Conference on Very Large Databases. 506--515.
[72]
Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. 2001. On the surprising behavior of distance metrics in high dimensional space. In Proceedings of the International Conference on Database Theory. Springer, 420--434.
[73]
Amol Ghoting, Srinivasan Parthasarathy, and Matthew Eric Otey. 2008. Fast mining of distance-based outliers in high-dimensional datasets. Data Min. Knowl. Discov. 16, 3 (2008), 349--364.
[74]
Hans-Peter Kriegel, Arthur Zimek, et al. 2008. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining. 444--452.
[75]
Hans-Peter Kriegel et al. 2009. Outlier detection in axis-parallel subspaces of high dimensional data. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 831--838.
[76]
Fabian Keller, Emmanuel Muller, and Klemens Bohm. 2012. HiCS: High contrast subspaces for density-based outlier ranking. In Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE’12). IEEE, 1037--1048.
[77]
Saket Sathe and Charu C. Aggarwal. 2016. Subspace outlier detection in linear time with randomized hashing. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 459--468.
[78]
Rakesh Agrawal et al. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Vol. 1215. 487--499.
[79]
Aleksandar Lazarevic and Vipin Kumar. 2005. Feature bagging for outlier detection. In Proceedings of the 11th ACM International Conference on Knowledge Discovery in Data Mining. 157--166.
[80]
Ji Zhang and Hai Wang. 2006. Detecting outlying subspaces for high-dimensional data: The new task, algorithms, and performance. Knowl. Inf. Syst. 10, 3 (2006), 333--355.
[81]
Saket Sathe and Charu Aggarwal. 2016. LODES: Local density meets spectral outlier detection. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 171--179.
[82]
Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2018. Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In Proceedings of the 24th ACM International Conference on Knowledge Discovery 8 Data Mining. 2041--2050.
[83]
Mahsa Salehi and Lida Rashidi. 2018. A survey on anomaly detection in evolving data: [with application to forest fire risk prediction]. ACM SIGKDD Explor. Newslett. 20, 1 (2018), 13--23.
[84]
Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM, 443--448.
[85]
Fabrizio Angiulli and Fabio Fassetti. 2007. Detecting distance-based outliers in streams of data. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. ACM, 811--820.
[86]
Di Yang et al. 2009. Neighbor-based pattern detection for windows over streaming data. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, 529--540.
[87]
Maria Kontaki et al. 2011. Continuous monitoring of distance-based outliers over data streams. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering. 135--146.
[88]
Lei Cao et al. 2014. Scalable distance-based outlier detection over high-volume data streams. In Proceedings of the 2014 IEEE 30th International Conference on Data Engineering. 76--87.
[89]
Dragoljub Pokrajac, Aleksandar Lazarevic, and Longin Jan Latecki. 2007. Incremental local outlier detection for data streams. In Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining. IEEE, 504--515.
[90]
Mahsa Salehi, Christopher Leckie, James C. Bezdek, Tharshan Vaithianathan, and Xuyun Zhang. 2016. Fast memory efficient local outlier detection in data streams. IEEE Trans. Knowl. Data Eng. 28, 12 (2016), 3246--3260.
[91]
Gyoung S. Na, Donghyun Kim, and Hwanjo Yu. 2018. DILOF: Effective and memory efficient local outlier detection in data streams. In Proceedings of the 24th ACM International Conference on Knowledge Discovery 8 Data Mining. 1993--2002.
[92]
Barnabás Póczos, Liang Xiong, and Jeff Schneider. 2012. Nonparametric divergence estimation with applications to machine learning on distributions. arXiv preprint (2012).
[93]
Yixin Chen and Li Tu. 2007. Density-based clustering for real-time stream data. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 133--142.
[94]
Manzoor Elahi et al. 2008. Efficient clustering-based outlier detection algorithm for dynamic data stream. In Proceedings of the 5th IEEE International Conference on Fuzzy Systems and Knowledge Discovery. 298--304.
[95]
Ira Assent, Philipp Kranen, Corinna Baldauf, and Thomas Seidl. 2012. Anyout: Anytime outlier detection on streaming data. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 228--242.
[96]
Philipp Kranen, Ira Assent, Corinna Baldauf, and Thomas Seidl. 2009. Self-adaptive anytime stream clustering. In Proceedings of the 2009 9th IEEE International Conference on Data Mining. IEEE, 249--258.
[97]
Mahsa Salehi, Christopher A. Leckie, Masud Moshtaghi, and Tharshan Vaithianathan. 2014. A relevance weighted ensemble model for anomaly detection in switching data streams. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining 2014. 461--473.
[98]
Milad Chenaghlou et al. 2017. An efficient method for anomaly detection in non-stationary data streams. In Proceedings of the IEEE Global Communications Conference. 1--6.
[99]
Philipp Kranen and Thomas Seidl. 2009. Harnessing the strengths of anytime algorithms for constant data streams. Data Min. Knowl. Discov. 19, 2 (2009), 245--260.
[100]
Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B 39, 1 (1977), 1--22.
[101]
Masud Moshtaghi, Sutharshan Rajasegarar, Christopher Leckie, and Shanika Karunasekera. 2011. An efficient hyperellipsoidal clustering algorithm for resource-constrained environments. Pattern Recogn. 44, 9 (2011), 2197--2209.
[102]
Masud Moshtaghi et al. 2011. Clustering ellipses for anomaly detection. Pattern Recogn. 44, 1 (2011), 55--69.
[103]
Richard Johnson et al. 2002. Applied Multivariate Statistical Analysis. Prentice--Hall, Upper Saddle River, NJ.
[104]
David Henderson et al. 1994. Experiencing Geometry on Plane and Sphere. Tech. Report T-MATH, Cornell Univ.
[105]
Wei Lu, Yanyan Shen, Su Chen, and Beng Chin Ooi. 2012. Efficient processing of k nearest neighbor joins using mapreduce. Proceedings VLDB Endow. 5, 10 (2012), 1016--1027.
[106]
Chi Zhang, Feifei Li, and Jeffrey Jestes. 2012. Efficient parallel kNN joins for large data in MapReduce. In Proceedings of the 15th International Conference on Extending Database Technology. ACM, 38--49.
[107]
Marius Muja and David G. Lowe. 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36, 11 (2014), 2227--2240.
[108]
Georgios Chatzimilioudis et al. 2016. Distributed in-memory processing of all k nearest neighbor queries. IEEE Trans. on Knowledge and Data Engineering 28, 4 (2016), 925--938.
[109]
Caitlin Kuhlman, Yizhou Yan, Lei Cao, and Elke Rundensteiner. 2017. Pivot-based distributed k-nearest neighbor mining. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 843--860.
[110]
Kanishka Bhaduri, Bryan L. Matthews, and Chris R. Giannella. 2011. Algorithms for speeding up distance-based outlier detection. In Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining. 859--867.
[111]
Fabrizio Angiulli, Stefano Basta, Stefano Lodi, and Claudio Sartori. 2013. Distributed strategies for mining outliers in large data sets. IEEE Trans. Knowl. Data Eng. 25, 7 (2013), 1520--1532.
[112]
Fabrizio Angiulli, Stefano Basta, and Clara Pizzuti. 2006. Distance-based detection and prediction of outliers. IEEE Trans. Knowl. Data Eng. 18, 2 (2006), 145--160.
[113]
Yizhou Yan, Lei Cao, Caitlin Kulhman, and Elke Rundensteiner. 2017. Distributed local outlier detection in big data. In Proceedings of the 23rd ACM International Conference on Knowledge Discovery and Data Mining. 1225--1234.
[114]
Yizhou Yan, Lei Cao, and Elke A. Rundensteiner. 2017. Distributed Top-N local outlier detection in big data. In Proceedings of the IEEE International Conference on Big Data (Big Data’17). IEEE, 827--836.
[115]
Mei Bai, Xite Wang, Junchang Xin, and Guoren Wang. 2016. An efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 181, C (2016), 19--28.
[116]
Yizhou Yan, Lei Cao, and Elke A. Rundensteiner. 2017. Scalable Top-n local outlier detection. In Proceedings of the 23rd ACM International Conference on Knowledge Discovery and Data Mining. 1235--1244.
[117]
Ian Goodfellow et al. 2016. Deep Learning: Speech Recognition. MIT Press.
[118]
Dario Amodei et al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In Proceedings of the International Conference on Machine Learning. 173--182.
[119]
Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. 2018. Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13, 3 (2018), 55--75.
[120]
Tsung-Yi Lin et al. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117--2125.
[121]
Jinghui Chen, Saket Sathe, Charu Aggarwal, and Deepak Turaga. 2017. Outlier detection with autoencoder ensembles. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 90--98.
[122]
Thomas Schlegl et al. 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Proceedings of the International Conference on Information Processing in Medical Imaging. Springer, 146--157.
[123]
Houssam Zenati, Manon Romain, Chuan-Sheng Foo, Bruno Lecouat, and Vijay Chandrasekhar. 2018. Adversarially learned anomaly detection. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 727--736.
[124]
Simon Hawkins, Hongxing He, Graham Williams, and Rohan Baxter. 2002. Outlier detection using replicator neural networks. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery. Springer, 170--180.
[125]
Emmanuel J. Candès et al. 2011. Robust principal component analysis? J. ACM 58, 3 (2011), 11.
[126]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.
[127]
Ian Goodfellow et al. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.
[128]
Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, and Aaron Courville. 2016. Adversarially learned inference. arXiv preprint arXiv:1606.00704 (2016).
[129]
Donahue et al. 2016. Adversarial feature learning. arXiv preprint arXiv:1605.09782 (2016).
[130]
Chunyuan Li, Hao Liu, Changyou Chen, et al. 2017. Alice: Towards understanding adversarial learning for joint distribution matching. In Advances in Neural Information Processing Systems. 5495--5503.
[131]
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018).
[132]
Mahito Sugiyama and Karsten Borgwardt. 2013. Rapid distance-based outlier detection via sampling. In Advances in Neural Information Processing Systems. 467--475.
[133]
Devdatt P. Dubhashi and Alessandro Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press.
[134]
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (Chapelle, O. et al., eds.; 2006) [book reviews]. IEEE Trans. Neur. Netw. 20, 3 (2009), 542--542.
[135]
Burr Settles. 2009. Active Learning Literature Survey. Technical Report. T-CS, University of Wisconsin—Madison.
[136]
Nico Görnitz, Marius Kloft, Konrad Rieck, and Ulf Brefeld. 2013. Toward supervised anomaly detection. J. Artif. Intell. Res. 46, 1 (2013), 235--262.
[137]
David M. J. Tax and Robert P. W. Duin. 2004. Support vector data description. Machine Learning 54, 1 (2004), 45--66.
[138]
Stephen Boyd, Corinna Cortes, Mehryar Mohri, and Ana Radovanovic. 2012. Accuracy at the top. In Advances in Neural Information Processing Systems. 953--961.
[139]
Md Amran Siddiqui et al. 2018. Feedback-guided anomaly discovery via online optimizatiaon. In Proceedings of the 24th ACM International Conference on Knowledge Discovery 8 Data Mining. 2200--2209.
[140]
Shai Shalev-Shwartz et al. 2012. Online learning and online convex optimization. Found. Trends Mach. Learn. 4, 2 (2012), 107--194.
[141]
Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schrödl, et al. 2001. Constrained k-means clustering with background knowledge. In Proceedings of the (ICML’01), Vol. 1. 577--584.

Cited By

View all
  • (2025)GBMOD: A granular-ball mean-shift outlier detectorPattern Recognition10.1016/j.patcog.2024.111115159(111115)Online publication date: Mar-2025
  • (2025)A measurement error prediction framework for smart meters in typical regionsMeasurement10.1016/j.measurement.2024.116254242(116254)Online publication date: Jan-2025
  • (2024)A Novel Framework for Image Matching and Stitching for Moving Car Inspection under Illumination ChallengesSensors10.3390/s2404108324:4(1083)Online publication date: 7-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 53, Issue 3
May 2021
787 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3403423
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2020
Online AM: 07 May 2020
Accepted: 01 January 2020
Revised: 01 January 2020
Received: 01 August 2019
Published in CSUR Volume 53, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Outlier detection
  2. anomaly detection
  3. semi-supervised learning
  4. unsupervised learning

Qualifiers

  • Survey
  • Research
  • Refereed

Funding Sources

  • NSERC-DISCOVERY
  • Canada Research Chairs Program
  • NSERC-CREATE TRANSIT Funds
  • NSERC-SPG

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,894
  • Downloads (Last 6 weeks)193
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)GBMOD: A granular-ball mean-shift outlier detectorPattern Recognition10.1016/j.patcog.2024.111115159(111115)Online publication date: Mar-2025
  • (2025)A measurement error prediction framework for smart meters in typical regionsMeasurement10.1016/j.measurement.2024.116254242(116254)Online publication date: Jan-2025
  • (2024)A Novel Framework for Image Matching and Stitching for Moving Car Inspection under Illumination ChallengesSensors10.3390/s2404108324:4(1083)Online publication date: 7-Feb-2024
  • (2024)Insider Threat Detection Model Enhancement Using Hybrid Algorithms between Unsupervised and Supervised LearningElectronics10.3390/electronics1305097313:5(973)Online publication date: 3-Mar-2024
  • (2024)Anomaly Detection Industrial ItemsInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT241026810:2(729-734)Online publication date: 25-Apr-2024
  • (2024)OutliersLearn: Educational Outlier Package with Common Outlier Detection AlgorithmsCRAN: Contributed Packages10.32614/CRAN.package.OutliersLearnOnline publication date: 5-Jun-2024
  • (2024)Cracking black-box models: Revealing hidden machine learning techniques behind their predictionsIntelligent Data Analysis10.3233/IDA-230707(1-21)Online publication date: 20-Mar-2024
  • (2024)Genetic association analysis in sugarcane (Saccharum spp.) for sucrose accumulation in humid environments in ColombiaBMC Plant Biology10.1186/s12870-024-05233-y24:1Online publication date: 18-Jun-2024
  • (2024)Active Learning for Data Quality Control: A SurveyJournal of Data and Information Quality10.1145/3663369Online publication date: 11-May-2024
  • (2024)Enhancing Unsupervised Outlier Model Selection: A Study on IREOS AlgorithmsACM Transactions on Knowledge Discovery from Data10.1145/365371918:7(1-25)Online publication date: 19-Jun-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media