With the increasing advances in hardware technology for data collection, and advances in software technology (databases) for data organization, computer scientists have increasingly participated in the latest advancements of the outlier analysis field. Computer scientists, specifically, approach this field based on their practical experiences in managing large amounts of data, and with far fewer assumptions the data can be of any type, structured or unstructured, and may be extremely large. Outlier Analysisis a comprehensive exposition, as understood by data mining experts, statisticians and computer scientists. The book has been organized carefully, and emphasis was placed on simplifying the content, so that students and practitioners can also benefit. Chapters will typically cover one of three areas: methods and techniques commonly used in outlier analysis, such as linear methods, proximity-based methods, subspace methods, and supervised methods; data domains, such as, text, categorical, mixed-attribute, time-series, streaming, discrete sequence, spatial and network data; and key applications of these methods as applied to diverse domains such as credit card fraud detection, intrusion detection, medical diagnosis, earth science, web log analytics, and social network analysis are covered.
Cited By
- Pang G, Shen C, Cao L and Hengel A (2021). Deep Learning for Anomaly Detection, ACM Computing Surveys, 54:2, (1-38), Online publication date: 31-Mar-2022.
- Campos D, Kieu T, Guo C, Huang F, Zheng K, Yang B and Jensen C (2022). Unsupervised time series outlier detection with diversity-driven convolutional ensembles, Proceedings of the VLDB Endowment, 15:3, (611-623), Online publication date: 1-Nov-2021.
- Yuan Z, Chen H, Li T, Yu Z, Sang B and Luo C (2021). Unsupervised attribute reduction for mixed data based on fuzzy rough sets, Information Sciences: an International Journal, 572:C, (67-87), Online publication date: 1-Sep-2021.
- Pang G and Aggarwal C Toward Explainable Deep Anomaly Detection Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (4056-4057)
- Pang G, van den Hengel A, Shen C and Cao L Toward Deep Supervised Anomaly Detection Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (1298-1308)
- Cong Z, Chu L, Yang Y and Pei J (2021). Comprehensible counterfactual explanation on Kolmogorov-Smirnov test, Proceedings of the VLDB Endowment, 14:9, (1583-1596), Online publication date: 1-May-2021.
- Xu H, Wang Y, Jian S, Huang Z, Wang Y, Liu N and Li F Beyond Outlier Detection: Outlier Interpretation by Attention-Guided Triplet Deviation Network Proceedings of the Web Conference 2021, (1328-1339)
- Pang G, Cao L and Aggarwal C Deep Learning for Anomaly Detection: Challenges, Methods, and Opportunities Proceedings of the 14th ACM International Conference on Web Search and Data Mining, (1127-1130)
- Pang G and Cao L (2020). Heterogeneous Univariate Outlier Ensembles in Multidimensional Data, ACM Transactions on Knowledge Discovery from Data, 14:6, (1-27), Online publication date: 31-Dec-2021.
- Wang T, Duan L, Dong G and Bao Z (2020). Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence Data, ACM Transactions on Knowledge Discovery from Data, 14:5, (1-26), Online publication date: 31-Oct-2020.
- Visengeriyeva L and Abedjan Z (2020). Anatomy of Metadata for Data Curation, Journal of Data and Information Quality, 12:3, (1-30), Online publication date: 30-Sep-2020.
- Ribeiro R and Moniz N (2020). Imbalanced regression and extreme value prediction, Machine Language, 109:9-10, (1803-1835), Online publication date: 1-Sep-2020.
- Zhang H, Zheng W, Chen C, Gao K, Hu Y, Huang L and Xu W Modeling Heterogeneous Statistical Patterns in High-dimensional Data by Adversarial Distributions: An Unsupervised Generative Framework Proceedings of The Web Conference 2020, (1389-1399)
- Angiulli F (2020). CFOF, ACM Transactions on Knowledge Discovery from Data, 14:1, (1-53), Online publication date: 29-Feb-2020.
- Thudumu S, Branch P, Jin J and Singh J Estimation of Locally Relevant Subspace in High-dimensional Data Proceedings of the Australasian Computer Science Week Multiconference, (1-6)
- Benmakrelouf S, St-Onge C, Kara N, Tout H, Edstrom C and Lemieux Y (2022). Abnormal behavior detection using resource level to service level metrics mapping in virtualized systems, Future Generation Computer Systems, 102:C, (680-700), Online publication date: 1-Jan-2020.
- Gopalan P, Sharan V and Wieder U PIDForest Proceedings of the 33rd International Conference on Neural Information Processing Systems, (15809-15819)
- Wei Z, Pei Q, Liu X and Ma L Efficient Privacy Preserving Cross-Datasets Collaborative Outlier Detection Cyberspace Safety and Security, (343-356)
- Trittenbach H and Böhm K One-Class Active Learning for Outlier Detection with Multiple Subspaces Proceedings of the 28th ACM International Conference on Information and Knowledge Management, (811-820)
- Wilmet A, Viard T, Latapy M and Lamarche-Perrin R (2022). Outlier detection in IP traffic modelled as a link stream using the stability of degree distributions over time, Computer Networks: The International Journal of Computer and Telecommunications Networking, 161:C, (197-209), Online publication date: 9-Oct-2019.
- Khan S, Liew C, Yairi T and McWilliam R (2022). Unsupervised anomaly detection in unmanned aerial vehicles, Applied Soft Computing, 83:C, Online publication date: 1-Oct-2019.
- Kieu T, Yang B, Guo C and Jensen C Outlier detection for time series with recurrent autoencoder ensembles Proceedings of the 28th International Joint Conference on Artificial Intelligence, (2725-2732)
- Mamun M, Berger C and Hansson J (2019). Effects of measurements on correlations of software code metrics, Empirical Software Engineering, 24:4, (2764-2818), Online publication date: 1-Aug-2019.
- Ilyas I and Chu X (2019). Data Cleaning, 10.1145/3310205, Online publication date: 9-Jul-2019.
- Sun A, Zhong Z, Jeong H and Yang Q (2019). Building complex event processing capability for intelligent environmental monitoring, Environmental Modelling & Software, 116:C, (1-6), Online publication date: 1-Jun-2019.
- Yahalom R, Steren A, Nameri Y, Roytman M, Porgador A and Elovici Y (2019). Improving the effectiveness of intrusion detection systems for hierarchical data, Knowledge-Based Systems, 168:C, (59-69), Online publication date: 15-Mar-2019.
- Calikus E, Fan Y, Nowaczyk S and Sant'Anna A Interactive-COSMO Proceedings of the Workshop on Interactive Data Mining, (1-9)
- Xu H, Wang Y, Wu Z and Wang Y Embedding-based complex feature value coupling learning for detecting outliers in non-IID categorical data Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, (5541-5548)
- Duraj A, Niewiadomski A and Szczepaniak P (2018). Detection of outlier information by the use of linguistic summaries based on classic and interval‐valued fuzzy sets, International Journal of Intelligent Systems, 34:3, (415-438), Online publication date: 21-Jan-2019.
- Liu D, Cui W, Jin K, Guo Y and Qu H (2018). DeepTracker, ACM Transactions on Intelligent Systems and Technology, 10:1, (1-25), Online publication date: 16-Jan-2019.
- Abuzaid F, Bailis P, Ding J, Gan E, Madden S, Narayanan D, Rong K and Suri S (2018). MacroBase, ACM Transactions on Database Systems, 43:4, (1-45), Online publication date: 16-Dec-2018.
- Tatbul N, Lee T, Zdonik S, Alam M and Gottschlich J Precision and recall for time series Proceedings of the 32nd International Conference on Neural Information Processing Systems, (1924-1934)
- Xu H, Wang Y, Cheng L, Wang Y and Ma X Exploring a High-quality Outlying Feature Value Set for Noise-Resilient Outlier Detection in Categorical Data Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (17-26)
- Abdallah Z, Gaber M, Srinivasan B and Krishnaswamy S (2018). Activity Recognition with Evolving Data Streams, ACM Computing Surveys, 51:4, (1-36), Online publication date: 6-Sep-2018.
- Manzoor E, Lamba H and Akoglu L xStream Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (1963-1972)
- Yu W, Cheng W, Aggarwal C, Zhang K, Chen H and Wang W NetWalk Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2672-2681)
- Salehi M and Rashidi L (2018). A Survey on Anomaly detection in Evolving Data, ACM SIGKDD Explorations Newsletter, 20:1, (13-23), Online publication date: 29-May-2018.
- He J and Xiong N (2018). An effective information detection method for social big data, Multimedia Tools and Applications, 77:9, (11277-11305), Online publication date: 1-May-2018.
- Ahmed M (2018). Reservoir-based network traffic stream summarization for anomaly detection, Pattern Analysis & Applications, 21:2, (579-599), Online publication date: 1-May-2018.
- Pang G, Cao L, Chen L, Lian D and Liu H Sparse modeling-based sequential ensemble learning for effective outlier detection in high-dimensional numeric data Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, (3892-3899)
- Benkabou S, Benabdeslem K and Canitia B (2018). Unsupervised outlier detection for time series by entropy and dynamic time warping, Knowledge and Information Systems, 54:2, (463-486), Online publication date: 1-Feb-2018.
- Szczepaniak P, Duraj A and Gil D (2018). Case-Based Reasoning, Complexity, 2018, Online publication date: 1-Jan-2018.
- Forestiero A (2017). Bio-inspired algorithm for outliers detection, Multimedia Tools and Applications, 76:24, (25659-25677), Online publication date: 1-Dec-2017.
- Manco G, Ritacco E, Rullo P, Gallucci L, Astill W, Kimber D and Antonelli M (2017). Fault detection and explanation through big data analysis on sensor streams, Expert Systems with Applications: An International Journal, 87:C, (141-156), Online publication date: 30-Nov-2017.
- Zhu M, Aggarwal C, Ma S, Zhang H and Huai J Outlier Detection in Sparse Data with Factorization Machines Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (817-826)
- Kulczycki P and Kruszewski D (2017). Identification of atypical elements by transforming task to supervised form with fuzzy and intuitionistic fuzzy evaluations, Applied Soft Computing, 60:C, (623-633), Online publication date: 1-Nov-2017.
- Nadeem F, Alghazzawi D, Mashat A, Fakeeh K, Almalaise A and Hagras H (2017). Modeling and predicting execution time of scientific workflows in the Grid using radial basis function neural network, Cluster Computing, 20:3, (2805-2819), Online publication date: 1-Sep-2017.
- Xu Z, Kersting K and Ritter L Stochastic online anomaly analysis for streaming time series Proceedings of the 26th International Joint Conference on Artificial Intelligence, (3189-3195)
- Fu Y, Aggarwal C, Parthasarathy S, Turaga D and Xiong H REMIX Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (827-835)
- Demiralp Ç, Haas P, Parthasarathy S and Pedapati T (2017). Foresight, Proceedings of the VLDB Endowment, 10:12, (1937-1940), Online publication date: 1-Aug-2017.
- Altimira D, Mueller F, Clarke J, Lee G, Billinghurst M and Bartneck C (2017). Enhancing player engagement through game balancing in digitally augmented physical games, International Journal of Human-Computer Studies, 103:C, (35-47), Online publication date: 1-Jul-2017.
- Jankov D, Sikdar S, Mukherjee R, Teymourian K and Jermaine C Real-time High Performance Anomaly Detection over Data Streams Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, (292-297)
- Bailis P, Gan E, Madden S, Narayanan D, Rong K and Suri S MacroBase Proceedings of the 2017 ACM International Conference on Management of Data, (541-556)
- Theissler A (2017). Detecting known and unknown faults in automotive systems using ensemble-based anomaly detection, Knowledge-Based Systems, 123:C, (163-173), Online publication date: 1-May-2017.
- Rathore S and Kumar S (2017). A decision tree logic based recommendation system to select software fault prediction techniques, Computing, 99:3, (255-285), Online publication date: 1-Mar-2017.
- Conforti R, Rosa M and Hofstede A (2017). Filtering Out Infrequent Behavior from Business Process Event Logs, IEEE Transactions on Knowledge and Data Engineering, 29:2, (300-314), Online publication date: 1-Feb-2017.
- Sharma M, Sarcar S, Sheet D and Biswas P Limitations with measuring performance of techniques for abnormality localization in surveillance video and how to overcome them? Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing, (1-8)
- Forestiero A (2016). Self-organizing anomaly detection in data streams, Information Sciences: an International Journal, 373:C, (321-336), Online publication date: 10-Dec-2016.
- Alrwais S, Yuan K, Alowaisheq E, Liao X, Oprea A, Wang X and Li Z Catching predators at watering holes Proceedings of the 32nd Annual Conference on Computer Security Applications, (153-166)
- Salehi M, Zhang X, Bezdek J and Leckie C Smart Sampling: A Novel Unsupervised Boosting Approach for Outlier Detection AI 2016: Advances in Artificial Intelligence, (469-481)
- Liang P and Wongthanavasu S (2016). Hybrid linear matrix factorization for topic-coherent terms clustering, Expert Systems with Applications: An International Journal, 62:C, (358-372), Online publication date: 15-Nov-2016.
- Ray S and Wright A Detecting Anomalies in Alert Firing within Clinical Decision Support Systems using Anomaly/Outlier Detection Techniques Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, (185-190)
- Wessel M, Thies F and Benlian A (2016). The emergence and effects of fake social information, Decision Support Systems, 90:C, (75-85), Online publication date: 1-Oct-2016.
- He S, Tan J and Chan S Towards area classification for large-scale fingerprint-based system Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, (232-243)
- Chu X and Ilyas I (2016). Qualitative data cleaning, Proceedings of the VLDB Endowment, 9:13, (1605-1608), Online publication date: 1-Sep-2016.
- Schneider M, Ertel W and Palm G Constant time expected similarity estimation for large-scale anomaly detection Proceedings of the Twenty-second European Conference on Artificial Intelligence, (12-20)
- Shen Y, Liu H, Wang Y, Chen Z and Sun G A novel isolation-based outlier detection method Proceedings of the 14th Pacific Rim International Conference on Trends in Artificial Intelligence, (446-456)
- Sharma M, Sheet D and Biswas P Abnormality Detecting Deep Belief Network Proceedings of the International Conference on Advances in Information Communication Technology & Computing, (1-6)
- Schroeder J, Berger C, Staron M, Herpel T and Knauss A Unveiling anomalies and their impact on software quality in model-based automotive software revisions with software metrics and domain experts Proceedings of the 25th International Symposium on Software Testing and Analysis, (154-164)
- Martin S and Quach T Interactive Visualization of Multivariate Time Series Data Proceedings, Part II, of the 10th International Conference on Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience - Volume 9744, (322-332)
- Campos G, Zimek A, Sander J, Campello R, Micenková B, Schubert E, Assent I and Houle M (2016). On the evaluation of unsupervised outlier detection, Data Mining and Knowledge Discovery, 30:4, (891-927), Online publication date: 1-Jul-2016.
- Chu X, Ilyas I, Krishnan S and Wang J Data Cleaning Proceedings of the 2016 International Conference on Management of Data, (2201-2206)
- Bindu P and Thilagam P (2016). Mining social networks for anomalies, Journal of Network and Computer Applications, 68:C, (213-229), Online publication date: 1-Jun-2016.
- Altimira D, Mueller F, Clarke J, Lee G, Billinghurst M and Bartneck C Digitally Augmenting Sports Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, (1681-1691)
- Angiulli F and Fassetti F (2016). Toward Generalizing the Unification with Statistical Outliers, ACM Transactions on Knowledge Discovery from Data, 10:3, (1-26), Online publication date: 24-Feb-2016.
- Dutta J, Banerjee B and Reddy C (2016). RODS: Rarity based Outlier Detection in a Sparse Coding Framework, IEEE Transactions on Knowledge and Data Engineering, 28:2, (483-495), Online publication date: 1-Feb-2016.
- Faria E, Gonçalves I, Carvalho A and Gama J (2016). Novelty detection in data streams, Artificial Intelligence Review, 45:2, (235-269), Online publication date: 1-Feb-2016.
- Huang H and Kasiviswanathan S (2015). Streaming anomaly detection using randomized matrix sketching, Proceedings of the VLDB Endowment, 9:3, (192-203), Online publication date: 1-Nov-2015.
- Bandyopadhyay S, Ukil A, Puri C, Pal A, Singh R and Bose T Demo Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, (469-470)
- Jagadeesan L, Mc Bride A, Gurbani V and Yang J Cognitive Security Proceedings of the Principles, Systems and Applications on IP Telecommunications, (43-50)
- Aggarwal C and Sathe S (2015). Theoretical Foundations and Algorithms for Outlier Ensembles, ACM SIGKDD Explorations Newsletter, 17:1, (24-47), Online publication date: 29-Sep-2015.
- Neuvirth H, Finkelstein Y, Hilbuch A, Nahum S, Alon D and Yom-Tov E Early detection of fraud storms in the cloud Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III, (53-67)
- Dalmia A, Gupta M and Varma V Query-based Graph Cuboid Outlier Detection Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, (705-712)
- Laptev N, Amizadeh S and Flint I Generic and Scalable Framework for Automated Time-series Anomaly Detection Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (1939-1947)
- Ranshous S, Shen S, Koutra D, Harenberg S, Faloutsos C and Samatova N (2015). Anomaly detection in dynamic networks, WIREs Computational Statistics, 7:3, (223-247), Online publication date: 1-May-2015.
- Thoring K, Mueller R and Badke-Schaub P Ethnographic Design Research With Wearable Cameras Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, (2049-2054)
- Krell M and Wöhrle H (2015). New one-class classifiers based on the origin separation approach, Pattern Recognition Letters, 53:C, (93-99), Online publication date: 1-Feb-2015.
- Pham T, Nguyen Q and Nguyen X Generating artificial attack data for intrusion detection using machine learning Proceedings of the 5th Symposium on Information and Communication Technology, (286-291)
- Tang G, Wu K, Pei J, Tang J and Lei J An Appliance-Driven Approach to Detection of Corrupted Load Curve Data Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, (1429-1438)
- Günnemann S, Günnemann N and Faloutsos C Detecting anomalies in dynamic rating data Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, (841-850)
- Sánchez P, Müller E, Irmler O and Böhm K Local context selection for outlier ranking in graphs with multiple numeric node attributes Proceedings of the 26th International Conference on Scientific and Statistical Database Management, (1-12)
- Cárdenas-Montes M Depth-Based Outlier Detection Algorithm Proceedings of the 9th International Conference on Hybrid Artificial Intelligence Systems - Volume 8480, (122-132)
- Günnemann N, Günnemann S and Faloutsos C Robust multivariate autoregression for anomaly detection in dynamic product ratings Proceedings of the 23rd international conference on World wide web, (361-372)
- Zimek A, Campello R and Sander J (2014). Ensembles for unsupervised outlier detection, ACM SIGKDD Explorations Newsletter, 15:1, (11-22), Online publication date: 17-Mar-2014.
- Sugiyama M and Borgwardt K Rapid distance-based outlier detection via sampling Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, (467-475)
- Keller F, Müller E, Wixler A and Böhm K Flexible and adaptive subspace search for outlier analysis Proceedings of the 22nd ACM international conference on Information & Knowledge Management, (1381-1390)
- Pei J Some New Progress in Analyzing and Mining Uncertain and Probabilistic Data for Big Data Analytics Proceedings of the 14th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume 8170, (38-45)
- Aggarwal C (2013). Outlier ensembles, ACM SIGKDD Explorations Newsletter, 14:2, (49-58), Online publication date: 30-Apr-2013.
- Li Z, Sun C, Liu C, Chen X, Wang M and Liu Y Dual-MGAN: An Efficient Approach for Semi-supervised Outlier Detection with Few Identified Anomalies, ACM Transactions on Knowledge Discovery from Data, 0:0
- Li T, Chen L and Chen C Fuzzy clustering based traffic pattern identification 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1181-1187)
- Bezerra C, Costa B, Guedes L and Angelov P A comparative study of autonomous learning outlier detection methods applied to fault detection 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1-7)
- Mukherjee G, Bhanot G, Raines K, Sastry S, Doniach S and Biehl M Predicting recurrence in clear cell Renal Cell Carcinoma: Analysis of TCGA data using outlier analysis and generalized matrix LVQ 2016 IEEE Congress on Evolutionary Computation (CEC), (656-661)
- Leitner L, Lagrange A and Endisch C End-of-line fault detection for combustion engines using one-class classification 2016 IEEE International Conference on Advanced Intelligent Mechatronics (AIM), (207-213)
Index Terms
- Outlier Analysis