More Web Proxy on the site http://driver.im/

survey

Outlier Detection: Methods, Models, and Classification

Authors:

Azzedine Boukerche,

Omar AlfandiAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 53, Issue 3

Article No.: 55, Pages 1 - 37

https://doi.org/10.1145/3381028

Published: 12 June 2020 Publication History

Abstract

Over the past decade, we have witnessed an enormous amount of research effort dedicated to the design of efficient outlier detection techniques while taking into consideration efficiency, accuracy, high-dimensional data, and distributed environments, among other factors. In this article, we present and examine these characteristics, current solutions, as well as open challenges and future research directions in identifying new outlier detection strategies. We propose a taxonomy of the recently designed outlier detection strategies while underlying their fundamental characteristics and properties. We also introduce several newly trending outlier detection methods designed for high-dimensional data, data streams, big data, and minimally labeled data. Last, we review their advantages and limitations and then discuss future and new challenging issues.

References

[1]

Dit-Yan Yeung and Calvin Chow. 2002. Parzen-window network intrusion detectors. In Object Recognition Supported by User Interaction for Service Robots, Vol. 4. IEEE, 385--388.

[2]

Robert Gwadera, Mikhail J. Atallah, and Wojciech Szpankowski. 2005. Reliable detection of episodes in event sequences. Knowl. Inf. Syst. 7, 4 (2005), 415--437.

Digital Library

[3]

Mikhail Atallah, Wojciech Szpankowski, and Robert Gwadera. 2004. Detection of significant sets of episodes in event sequences. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04). IEEE, 3--10.

[4]

Pedro Garcia-Teodoro, Jesus Diaz-Verdejo, Gabriel Maciá-Fernández, and Enrique Vázquez. 2009. Anomaly-based network intrusion detection: Techniques, systems and challenges. Comput. Secur. 28, 1 2 (2009), 18--28.

Digital Library

[5]

Richard J. Bolton and David J. Hand. 2001. Unsupervised profiling methods for fraud detection. In Proceedings of Credit Scoring and Credit Control VII. 5--7.

[6]

Sutapat Thiprungsri, Miklos A. Vasarhelyi, et al. 2011. Cluster analysis for anomaly detection in accounting data: An audit approach. Int. J. Dig. Account. Res. 11 (2011), 69--84.

[7]

Clifton Phua, Damminda Alahakoon, and Vincent Lee. 2004. Minority report in fraud detection: Classification of skewed data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 50--59.

Digital Library

[8]

Weng-Keen Wong, Andrew W. Moore, Gregory F. Cooper, and Michael M. Wagner. 2003. Bayesian network anomaly pattern detection for disease outbreaks. In Proceedings of the 20th International Conference on Machine Learning. 808--815.

[9]

Jessica Lin, Eamonn Keogh, Ada Fu, and Helga Van Herle. 2005. Approximations to magic: Finding unusual medical time series. In Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems. Citeseer, 329--334.

Digital Library

[10]

Ryohei Fujimaki, Takehisa Yairi, and Kazuo Machida. 2005. An approach to spacecraft anomaly detection problem using kernel feature space. In Proceedings of the 11th ACM International Conference on Knowledge Discovery in Data Mining. 401--410.

Digital Library

[11]

Vincent Vercruyssen, Wannes Meert, Gust Verbruggen, Koen Maes, Ruben Bäumer, and Jesse Davis. 2018. Semi-supervised anomaly detection with an application to water analytics. In Proceedings of the IEEE International Conference on Data Mining.

[12]

Yu-Lin Tsou, Hong-Min Chu, Cong Li, and Shao-Wen Yang. 2018. Robust distributed anomaly detection using optimal weighted one-class random forests. In Proceedings of the 2018 IEEE International Conference on Data Mining. 1272--1277.

[13]

Youcef Djenouri, Asma Belhadi, Jerry Chun-Wei Lin, Djamel Djenouri, and Alberto Cano. 2019. A survey on urban traffic anomalies detection algorithms. IEEE Access 7 (2019), 12192--12205.

[14]

Varun Chandola et al. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009), 15.

[15]

Hongzhi Wang et al. 2019. Progress in outlier detection techniques: A survey. IEEE Access 7 (2019), 107964--108000.

[16]

Guansong Pang, Kai Ming Ting, and David Albrecht. 2015. LeSiNN: Detecting anomalies by identifying least similar nearest neighbours. In Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW’15). IEEE, 623--630.

Digital Library

[17]

Haizhou Du, Shengjie Zhao, Daqiang Zhang, and Jinsong Wu. 2016. Novel clustering-based approach for local outlier detection. In Proceedings of the 2016 IEEE Conference on Computer Communications Workshops. 802--811.

[18]

Chong Zhou and Randy C. Paffenroth. 2017. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM International Conference on Knowledge Discovery and Data Mining. 665--674.

[19]

Frank E. Grubbs. 1969. Procedures for detecting outlying observations in samples. Technometrics 11, 1 (1969), 1--21.

[20]

V. Barnett and T. Lewis. 1994. Outliers in Statistical Data (Probability 8 Mathematical Statistics). (1994).

[21]

Markus Goldstein and Seiichi Uchida. 2016. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS ONE 11, 4 (2016), e0152173.

[22]

Charu C. Aggarwal. 2015. Outlier analysis. In Data Mining. Springer, 237--263.

[23]

Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In ACM Sigmod Record, Vol. 29. ACM, 93--104.

Digital Library

[24]

Ji Zhang. 2013. Advancements of outlier detection: A survey. ICST Trans. Scal. Inf. Syst. 13, 1 (2013), 1--26.

[25]

Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: A survey. Data Min. Knowl. Discov. 29, 3 (2015), 626--688.

Digital Library

[26]

Chesner Désir, Simon Bernard, Caroline Petitjean, and Laurent Heutte. 2013. One class random forests. Pattern Recogn. 46, 12 (2013), 3490--3506.

Digital Library

[27]

Shubhomoy Das, Weng-Keen Wong, Thomas Dietterich, Alan Fern, and Andrew Emmott. 2016. Incorporating expert feedback into active anomaly discovery. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining. 853--858.

[28]

Fei Liu et al. 2008. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining. 413--422.

[29]

Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Annual Symposium on Computational Geometry. ACM, 253--262.

[30]

Guy M. Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. IBM Germany Scientific Symposium Series (1966).

[31]

Edwin M. Knorr and Raymond T. Ng. 1998. Algorithms for mining distance-based outliers in large datasets. In VLDB, Vol. 98. Citeseer, 392--403.

Digital Library

[32]

Victoria Hodge et al. 2004. A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 2 (2004), 85--126.

Digital Library

[33]

Prasanta Gogoi, D. K. Bhattacharyya, Bhogeswar Borah, and Jugal K. Kalita. 2011. A survey of outlier detection methods in network anomaly identification. Comput. J. 54, 4 (2011), 570--588.

Digital Library

[34]

Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel. 2012. A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. 5, 5 (2012), 363--387.

Digital Library

[35]

Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2013. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 26, 9 (2013), 2250--2267.

[36]

Jian Tang, Zhixiang Chen, Ada Wai-Chee Fu, and David W. Cheung. 2002. Enhancing effectiveness of outlier detections for low density patterns. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 535--548.

[37]

Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. 2003. Loci: Fast outlier detection using the local correlation integral. In Proceedings 19th International Conference on Data Engineering 2003. IEEE, 315--326.

[38]

Wen Jin, Anthony K. H. Tung, Jiawei Han, and Wei Wang. 2006. Ranking outliers using symmetric neighborhood relationship. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 577--593.

Digital Library

[39]

Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2009. LoOP: Local outlier probabilities. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. 1649--1652.

Digital Library

[40]

Tharindu Bandaragoda. 2014. Efficient anomaly detection by isolation using nearest neighbour ensemble. In Proceedings of the 2014 IEEE International Conference on Data Mining Workshop. 698--705.

[41]

Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. 2000. Efficient algorithms for mining outliers from large data sets. In ACM Sigmod Record, Vol. 29. ACM, 427--438.

Digital Library

[42]

Fabrizio Angiulli and Clara Pizzuti. 2002. Fast outlier detection in high dimensional spaces. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 15--27.

[43]

Kai Ming Ting, Takashi Washio, Jonathan R. Wells, and Sunil Aryal. 2017. Defying the gravity of learning curve: A characteristic of nearest neighbour anomaly detectors. Mach. Learn. 106, 1 (2017), 55--91.

Digital Library

[44]

Mon-Fong Jiang, Shian-Shyong Tseng, and Chih-Ming Su. 2001. Two-phase clustering process for outliers detection. Pattern Recogn. Lett. 22, 6--7 (2001), 691--700.

Digital Library

[45]

John A. Hartigan and Manchek A. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C 28, 1 (1979), 100--108.

[46]

Zengyou He, Xiaofei Xu, and Shengchun Deng. 2003. Discovering cluster-based local outliers. Pattern Recogn. Lett. 24, 9--10 (2003), 1641--1650.

Digital Library

[47]

Mennatallah Amer and Markus Goldstein. 2012. Nearest-neighbor and clustering based anomaly detection algorithms for rapidminer. In Proceedings of the 3rd RapidMiner Community Meeting and Conference (RCOMM’12). 1--12.

[48]

Alex Rodriguez et al. 2014. Clustering by fast search and find of density peaks. Science 344, 6191 (2014), 1492--1496.

[49]

Brett G. Amidan, Thomas A. Ferryman, and Scott K. Cooley. 2005. Data outlier detection using the Chebyshev theorem. In Proceedings of the 2005 IEEE Aerospace Conference. IEEE, 3814--3819.

[50]

Dimitris Achlioptas. 2001. Database-friendly random projections. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 274--281.

Digital Library

[51]

Timothy De Vries, Sanjay Chawla, and Michael E. Houle. 2010. Finding local anomalies in very high dimensional space. In Proceedings of the IEEE 10th International Conference on Data Mining (ICDM’10). IEEE, 128--137.

[52]

Ye Wang, Srinivasan Parthasarathy, and Shirish Tatikonda. 2011. Locality sensitive outlier detection: A ranking driven approach. In Proceedings of the IEEE 27th International Conference on Data Engineering (ICDE’11). IEEE, 410--421.

Digital Library

[53]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing. ACM, 604--613.

Digital Library

[54]

Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel. 2015. Fast and scalable outlier detection with approximate nearest neighbor ensembles. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 19--36.

[55]

Tomáš Pevnỳ. 2016. Loda: Lightweight on-line detector of anomalies. Mach. Learn. 102, 2 (2016), 275--304.

Digital Library

[56]

S. Hariri, M. Carrasco Kind, and R. J. Brunner. 2018. Extended isolation forest. ArXiv e-prints (Nov. 2018). arxiv:1811.02141

[57]

Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press.

[58]

Antonin Guttman. 1984. R-trees: A Dynamic Index Structure for Spatial Searching. Vol. 14. ACM.

Digital Library

[59]

King-Ip Lin et al. 1994. The TV-tree: An index structure for high-dimensional data. VLDB J. 3, 4 (1994), 517--542.

Digital Library

[60]

Vladimir M. Zolotarev. 1986. One-dimensional Stable Distributions. Vol. 65. American Mathematical Soc.

[61]

Nguyen Hoang Vu and Vivekanand Gopalkrishnan. 2009. Efficient pruning schemes for distance-based outlier detection. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 160--175.

[62]

Jack A. Orenstein and Tim H. Merrett. 1984. A class of data structures for associative searching. In Proceedings of the 3rd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. ACM, 181--190.

[63]

Ting Li et al. 2016. A locality-aware similar information searching scheme. International J. Dig. Libr. 17, 2 (2016), 79--93.

Digital Library

[64]

Sampath Deegalla and Henrik Bostrom. 2006. Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. In Proceedings of the 5th IEEE International Conference on Machine Learning and Applications (ICMLA'06). IEEE, 245--250.

Digital Library

[65]

William Johnson et al. 1984. Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26, 189-206 (1984), 1.

[66]

George Kollios et al. 2003. Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng. 15, 5 (2003), 1170--1187.

Digital Library

[67]

Mingxi Wu and Christopher Jermaine. 2006. Outlier detection by sampling with accuracy guarantees. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 767--772.

Digital Library

[68]

Stephen D. Bay and Mark Schwabacher. 2003. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of 9th ACM International Conference on Knowledge Discovery and Data Mining. 29--38.

[69]

Wen Jin, Anthony K. H. Tung, and Jiawei Han. 2001. Mining top-n local outliers in large databases. In Proceedings of the Seventh ACM International Conference on Knowledge Discovery and Data Mining. ACM, 293--298.

Digital Library

[70]

Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1999. When is “nearest neighbor” meaningful? In Proceedings of the International Conference on Database Theory. Springer, 217--235.

Digital Library

[71]

Alexander Hinneburg, Charu C. Aggarwal, and Daniel A. Keim. 2000. What is the nearest neighbor in high dimensional spaces? In Proceedings of the 26th International Conference on Very Large Databases. 506--515.

[72]

Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. 2001. On the surprising behavior of distance metrics in high dimensional space. In Proceedings of the International Conference on Database Theory. Springer, 420--434.

Digital Library

[73]

Amol Ghoting, Srinivasan Parthasarathy, and Matthew Eric Otey. 2008. Fast mining of distance-based outliers in high-dimensional datasets. Data Min. Knowl. Discov. 16, 3 (2008), 349--364.

Digital Library

[74]

Hans-Peter Kriegel, Arthur Zimek, et al. 2008. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining. 444--452.

Digital Library

[75]

Hans-Peter Kriegel et al. 2009. Outlier detection in axis-parallel subspaces of high dimensional data. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 831--838.

[76]

Fabian Keller, Emmanuel Muller, and Klemens Bohm. 2012. HiCS: High contrast subspaces for density-based outlier ranking. In Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE’12). IEEE, 1037--1048.

Digital Library

[77]

Saket Sathe and Charu C. Aggarwal. 2016. Subspace outlier detection in linear time with randomized hashing. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 459--468.

[78]

Rakesh Agrawal et al. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Vol. 1215. 487--499.

[79]

Aleksandar Lazarevic and Vipin Kumar. 2005. Feature bagging for outlier detection. In Proceedings of the 11th ACM International Conference on Knowledge Discovery in Data Mining. 157--166.

Digital Library

[80]

Ji Zhang and Hai Wang. 2006. Detecting outlying subspaces for high-dimensional data: The new task, algorithms, and performance. Knowl. Inf. Syst. 10, 3 (2006), 333--355.

Digital Library

[81]

Saket Sathe and Charu Aggarwal. 2016. LODES: Local density meets spectral outlier detection. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 171--179.

[82]

Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2018. Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In Proceedings of the 24th ACM International Conference on Knowledge Discovery 8 Data Mining. 2041--2050.

Digital Library

[83]

Mahsa Salehi and Lida Rashidi. 2018. A survey on anomaly detection in evolving data: [with application to forest fire risk prediction]. ACM SIGKDD Explor. Newslett. 20, 1 (2018), 13--23.

Digital Library

[84]

Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM, 443--448.

[85]

Fabrizio Angiulli and Fabio Fassetti. 2007. Detecting distance-based outliers in streams of data. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. ACM, 811--820.

Digital Library

[86]

Di Yang et al. 2009. Neighbor-based pattern detection for windows over streaming data. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, 529--540.

[87]

Maria Kontaki et al. 2011. Continuous monitoring of distance-based outliers over data streams. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering. 135--146.

[88]

Lei Cao et al. 2014. Scalable distance-based outlier detection over high-volume data streams. In Proceedings of the 2014 IEEE 30th International Conference on Data Engineering. 76--87.

[89]

Dragoljub Pokrajac, Aleksandar Lazarevic, and Longin Jan Latecki. 2007. Incremental local outlier detection for data streams. In Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining. IEEE, 504--515.

[90]

Mahsa Salehi, Christopher Leckie, James C. Bezdek, Tharshan Vaithianathan, and Xuyun Zhang. 2016. Fast memory efficient local outlier detection in data streams. IEEE Trans. Knowl. Data Eng. 28, 12 (2016), 3246--3260.

Digital Library

[91]

Gyoung S. Na, Donghyun Kim, and Hwanjo Yu. 2018. DILOF: Effective and memory efficient local outlier detection in data streams. In Proceedings of the 24th ACM International Conference on Knowledge Discovery 8 Data Mining. 1993--2002.

Digital Library

[92]

Barnabás Póczos, Liang Xiong, and Jeff Schneider. 2012. Nonparametric divergence estimation with applications to machine learning on distributions. arXiv preprint (2012).

[93]

Yixin Chen and Li Tu. 2007. Density-based clustering for real-time stream data. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 133--142.

Digital Library

[94]

Manzoor Elahi et al. 2008. Efficient clustering-based outlier detection algorithm for dynamic data stream. In Proceedings of the 5th IEEE International Conference on Fuzzy Systems and Knowledge Discovery. 298--304.

[95]

Ira Assent, Philipp Kranen, Corinna Baldauf, and Thomas Seidl. 2012. Anyout: Anytime outlier detection on streaming data. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 228--242.

Digital Library

[96]

Philipp Kranen, Ira Assent, Corinna Baldauf, and Thomas Seidl. 2009. Self-adaptive anytime stream clustering. In Proceedings of the 2009 9th IEEE International Conference on Data Mining. IEEE, 249--258.

Digital Library

[97]

Mahsa Salehi, Christopher A. Leckie, Masud Moshtaghi, and Tharshan Vaithianathan. 2014. A relevance weighted ensemble model for anomaly detection in switching data streams. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining 2014. 461--473.

[98]

Milad Chenaghlou et al. 2017. An efficient method for anomaly detection in non-stationary data streams. In Proceedings of the IEEE Global Communications Conference. 1--6.

[99]

Philipp Kranen and Thomas Seidl. 2009. Harnessing the strengths of anytime algorithms for constant data streams. Data Min. Knowl. Discov. 19, 2 (2009), 245--260.

Digital Library

[100]

Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B 39, 1 (1977), 1--22.

[101]

Masud Moshtaghi, Sutharshan Rajasegarar, Christopher Leckie, and Shanika Karunasekera. 2011. An efficient hyperellipsoidal clustering algorithm for resource-constrained environments. Pattern Recogn. 44, 9 (2011), 2197--2209.

Digital Library

[102]

Masud Moshtaghi et al. 2011. Clustering ellipses for anomaly detection. Pattern Recogn. 44, 1 (2011), 55--69.

[103]

Richard Johnson et al. 2002. Applied Multivariate Statistical Analysis. Prentice--Hall, Upper Saddle River, NJ.

[104]

David Henderson et al. 1994. Experiencing Geometry on Plane and Sphere. Tech. Report T-MATH, Cornell Univ.

[105]

Wei Lu, Yanyan Shen, Su Chen, and Beng Chin Ooi. 2012. Efficient processing of k nearest neighbor joins using mapreduce. Proceedings VLDB Endow. 5, 10 (2012), 1016--1027.

Digital Library

[106]

Chi Zhang, Feifei Li, and Jeffrey Jestes. 2012. Efficient parallel kNN joins for large data in MapReduce. In Proceedings of the 15th International Conference on Extending Database Technology. ACM, 38--49.

Digital Library

[107]

Marius Muja and David G. Lowe. 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36, 11 (2014), 2227--2240.

[108]

Georgios Chatzimilioudis et al. 2016. Distributed in-memory processing of all k nearest neighbor queries. IEEE Trans. on Knowledge and Data Engineering 28, 4 (2016), 925--938.

Digital Library

[109]

Caitlin Kuhlman, Yizhou Yan, Lei Cao, and Elke Rundensteiner. 2017. Pivot-based distributed k-nearest neighbor mining. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 843--860.

[110]

Kanishka Bhaduri, Bryan L. Matthews, and Chris R. Giannella. 2011. Algorithms for speeding up distance-based outlier detection. In Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining. 859--867.

[111]

Fabrizio Angiulli, Stefano Basta, Stefano Lodi, and Claudio Sartori. 2013. Distributed strategies for mining outliers in large data sets. IEEE Trans. Knowl. Data Eng. 25, 7 (2013), 1520--1532.

Digital Library

[112]

Fabrizio Angiulli, Stefano Basta, and Clara Pizzuti. 2006. Distance-based detection and prediction of outliers. IEEE Trans. Knowl. Data Eng. 18, 2 (2006), 145--160.

Digital Library

[113]

Yizhou Yan, Lei Cao, Caitlin Kulhman, and Elke Rundensteiner. 2017. Distributed local outlier detection in big data. In Proceedings of the 23rd ACM International Conference on Knowledge Discovery and Data Mining. 1225--1234.

[114]

Yizhou Yan, Lei Cao, and Elke A. Rundensteiner. 2017. Distributed Top-N local outlier detection in big data. In Proceedings of the IEEE International Conference on Big Data (Big Data’17). IEEE, 827--836.

[115]

Mei Bai, Xite Wang, Junchang Xin, and Guoren Wang. 2016. An efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 181, C (2016), 19--28.

[116]

Yizhou Yan, Lei Cao, and Elke A. Rundensteiner. 2017. Scalable Top-n local outlier detection. In Proceedings of the 23rd ACM International Conference on Knowledge Discovery and Data Mining. 1235--1244.

[117]

Ian Goodfellow et al. 2016. Deep Learning: Speech Recognition. MIT Press.

[118]

Dario Amodei et al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In Proceedings of the International Conference on Machine Learning. 173--182.

[119]

Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. 2018. Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13, 3 (2018), 55--75.

[120]

Tsung-Yi Lin et al. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117--2125.

[121]

Jinghui Chen, Saket Sathe, Charu Aggarwal, and Deepak Turaga. 2017. Outlier detection with autoencoder ensembles. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 90--98.

[122]

Thomas Schlegl et al. 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Proceedings of the International Conference on Information Processing in Medical Imaging. Springer, 146--157.

[123]

Houssam Zenati, Manon Romain, Chuan-Sheng Foo, Bruno Lecouat, and Vijay Chandrasekhar. 2018. Adversarially learned anomaly detection. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 727--736.

[124]

Simon Hawkins, Hongxing He, Graham Williams, and Rohan Baxter. 2002. Outlier detection using replicator neural networks. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery. Springer, 170--180.

[125]

Emmanuel J. Candès et al. 2011. Robust principal component analysis? J. ACM 58, 3 (2011), 11.

Digital Library

[126]

Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.

Digital Library

[127]

Ian Goodfellow et al. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.

[128]

Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, and Aaron Courville. 2016. Adversarially learned inference. arXiv preprint arXiv:1606.00704 (2016).

[129]

Donahue et al. 2016. Adversarial feature learning. arXiv preprint arXiv:1605.09782 (2016).

[130]

Chunyuan Li, Hao Liu, Changyou Chen, et al. 2017. Alice: Towards understanding adversarial learning for joint distribution matching. In Advances in Neural Information Processing Systems. 5495--5503.

[131]

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018).

[132]

Mahito Sugiyama and Karsten Borgwardt. 2013. Rapid distance-based outlier detection via sampling. In Advances in Neural Information Processing Systems. 467--475.

[133]

Devdatt P. Dubhashi and Alessandro Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press.

[134]

Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (Chapelle, O. et al., eds.; 2006) [book reviews]. IEEE Trans. Neur. Netw. 20, 3 (2009), 542--542.

Digital Library

[135]

Burr Settles. 2009. Active Learning Literature Survey. Technical Report. T-CS, University of Wisconsin—Madison.

[136]

Nico Görnitz, Marius Kloft, Konrad Rieck, and Ulf Brefeld. 2013. Toward supervised anomaly detection. J. Artif. Intell. Res. 46, 1 (2013), 235--262.

[137]

David M. J. Tax and Robert P. W. Duin. 2004. Support vector data description. Machine Learning 54, 1 (2004), 45--66.

Digital Library

[138]

Stephen Boyd, Corinna Cortes, Mehryar Mohri, and Ana Radovanovic. 2012. Accuracy at the top. In Advances in Neural Information Processing Systems. 953--961.

[139]

Md Amran Siddiqui et al. 2018. Feedback-guided anomaly discovery via online optimizatiaon. In Proceedings of the 24th ACM International Conference on Knowledge Discovery 8 Data Mining. 2200--2209.

[140]

Shai Shalev-Shwartz et al. 2012. Online learning and online convex optimization. Found. Trends Mach. Learn. 4, 2 (2012), 107--194.

Digital Library

[141]

Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schrödl, et al. 2001. Constrained k-means clustering with background knowledge. In Proceedings of the (ICML’01), Vol. 1. 577--584.

Cited By

Cheng SSu XChen BChen HPeng DYuan Z(2025)GBMOD: A granular-ball mean-shift outlier detectorPattern Recognition10.1016/j.patcog.2024.111115159(111115)Online publication date: Mar-2025
https://doi.org/10.1016/j.patcog.2024.111115
Yu CSun NGao JHong FGuo Y(2025)A measurement error prediction framework for smart meters in typical regionsMeasurement10.1016/j.measurement.2024.116254242(116254)Online publication date: Jan-2025
https://doi.org/10.1016/j.measurement.2024.116254
El Saer AGrammatikopoulos LSfikas GKarras GPetsa E(2024)A Novel Framework for Image Matching and Stitching for Moving Car Inspection under Illumination ChallengesSensors10.3390/s2404108324:4(1083)Online publication date: 7-Feb-2024
https://doi.org/10.3390/s24041083
Show More Cited By

Index Terms

Outlier Detection: Methods, Models, and Classification

Recommendations

Semi-supervised Based Training Set Construction for Outlier Detection
CLOUDCOM-ASIA '13: Proceedings of the 2013 International Conference on Cloud Computing and Big Data

Outliers are sparse and few. It's costly to obtain a training set with enough outliers so that existing approaches to the problem of outlier detection seldom processed with supervised manner. However, given a training set with sufficient outliers, ...
Triangle-based outlier detection
Highlights
- Anomaly detection is one of the most common problems in data science.
- ...
Abstract
For the last decades, anomaly detection has been one of the most common problems in data mining and computer science projects. The scientific community has made a great effort to develop methods and techniques for the detection of ...
Semi-supervised outlier detection
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

Outlier detection has been extensively researched in the context of unsupervised learning. But the learning results are not always satisfactory, which can be significantly improved using supervision of some labeled points. In this paper, we are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 53, Issue 3

May 2021

787 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3403423

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2020

Online AM: 07 May 2020

Accepted: 01 January 2020

Revised: 01 January 2020

Received: 01 August 2019

Published in CSUR Volume 53, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Funding Sources

NSERC-DISCOVERY
Canada Research Chairs Program
NSERC-CREATE TRANSIT Funds
NSERC-SPG

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

163
Total Citations
View Citations
6,681
Total Downloads

Downloads (Last 12 months)1,894
Downloads (Last 6 weeks)193

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cheng SSu XChen BChen HPeng DYuan Z(2025)GBMOD: A granular-ball mean-shift outlier detectorPattern Recognition10.1016/j.patcog.2024.111115159(111115)Online publication date: Mar-2025
https://doi.org/10.1016/j.patcog.2024.111115
Yu CSun NGao JHong FGuo Y(2025)A measurement error prediction framework for smart meters in typical regionsMeasurement10.1016/j.measurement.2024.116254242(116254)Online publication date: Jan-2025
https://doi.org/10.1016/j.measurement.2024.116254
El Saer AGrammatikopoulos LSfikas GKarras GPetsa E(2024)A Novel Framework for Image Matching and Stitching for Moving Car Inspection under Illumination ChallengesSensors10.3390/s2404108324:4(1083)Online publication date: 7-Feb-2024
https://doi.org/10.3390/s24041083
Yi JTian Y(2024)Insider Threat Detection Model Enhancement Using Hybrid Algorithms between Unsupervised and Supervised LearningElectronics10.3390/electronics1305097313:5(973)Online publication date: 3-Mar-2024
https://doi.org/10.3390/electronics13050973
Owais Khan MLiu Peishun Abdul Basit Hakro DAbdul Majid (2024)Anomaly Detection Industrial ItemsInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT241026810:2(729-734)Online publication date: 25-Apr-2024
https://doi.org/10.32628/CSEIT2410268
(2024)OutliersLearn: Educational Outlier Package with Common Outlier Detection AlgorithmsCRAN: Contributed Packages10.32614/CRAN.package.OutliersLearnOnline publication date: 5-Jun-2024
https://doi.org/10.32614/CRAN.package.OutliersLearn
Fabra-Boluda RFerri CHernández-Orallo JRamírez-Quintana MMartínez-Plumed F(2024)Cracking black-box models: Revealing hidden machine learning techniques behind their predictionsIntelligent Data Analysis10.3233/IDA-230707(1-21)Online publication date: 20-Mar-2024
https://doi.org/10.3233/IDA-230707
Saavedra-Díaz CTrujillo-Montenegro JJaimes HLondoño AVillareal FLópez LValens CLópez-Gerena JRiascos JQuevedo YAguilar F(2024)Genetic association analysis in sugarcane (Saccharum spp.) for sucrose accumulation in humid environments in ColombiaBMC Plant Biology10.1186/s12870-024-05233-y24:1Online publication date: 18-Jun-2024
https://doi.org/10.1186/s12870-024-05233-y
Li NQi YLi CZhao Z(2024)Active Learning for Data Quality Control: A SurveyJournal of Data and Information Quality10.1145/3663369Online publication date: 11-May-2024
https://doi.org/10.1145/3663369
Schlieper PLuft HKlede KStrohmeyer CEskofier BZanca D(2024)Enhancing Unsupervised Outlier Model Selection: A Study on IREOS AlgorithmsACM Transactions on Knowledge Discovery from Data10.1145/365371918:7(1-25)Online publication date: 19-Jun-2024
https://dl.acm.org/doi/10.1145/3653719
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents