[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Skip header Section
Outlier AnalysisJanuary 2013
Publisher:
  • Springer Publishing Company, Incorporated
ISBN:978-1-4614-6395-5
Published:11 January 2013
Pages:
461
Skip Bibliometrics Section
Reflects downloads up to 04 Jan 2025Bibliometrics
Skip Abstract Section
Abstract

With the increasing advances in hardware technology for data collection, and advances in software technology (databases) for data organization, computer scientists have increasingly participated in the latest advancements of the outlier analysis field. Computer scientists, specifically, approach this field based on their practical experiences in managing large amounts of data, and with far fewer assumptions the data can be of any type, structured or unstructured, and may be extremely large. Outlier Analysisis a comprehensive exposition, as understood by data mining experts, statisticians and computer scientists. The book has been organized carefully, and emphasis was placed on simplifying the content, so that students and practitioners can also benefit. Chapters will typically cover one of three areas: methods and techniques commonly used in outlier analysis, such as linear methods, proximity-based methods, subspace methods, and supervised methods; data domains, such as, text, categorical, mixed-attribute, time-series, streaming, discrete sequence, spatial and network data; and key applications of these methods as applied to diverse domains such as credit card fraud detection, intrusion detection, medical diagnosis, earth science, web log analytics, and social network analysis are covered.

Cited By

  1. ACM
    Pang G, Shen C, Cao L and Hengel A (2021). Deep Learning for Anomaly Detection, ACM Computing Surveys, 54:2, (1-38), Online publication date: 31-Mar-2022.
  2. Campos D, Kieu T, Guo C, Huang F, Zheng K, Yang B and Jensen C (2022). Unsupervised time series outlier detection with diversity-driven convolutional ensembles, Proceedings of the VLDB Endowment, 15:3, (611-623), Online publication date: 1-Nov-2021.
  3. Yuan Z, Chen H, Li T, Yu Z, Sang B and Luo C (2021). Unsupervised attribute reduction for mixed data based on fuzzy rough sets, Information Sciences: an International Journal, 572:C, (67-87), Online publication date: 1-Sep-2021.
  4. ACM
    Pang G and Aggarwal C Toward Explainable Deep Anomaly Detection Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (4056-4057)
  5. ACM
    Pang G, van den Hengel A, Shen C and Cao L Toward Deep Supervised Anomaly Detection Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (1298-1308)
  6. Cong Z, Chu L, Yang Y and Pei J (2021). Comprehensible counterfactual explanation on Kolmogorov-Smirnov test, Proceedings of the VLDB Endowment, 14:9, (1583-1596), Online publication date: 1-May-2021.
  7. ACM
    Xu H, Wang Y, Jian S, Huang Z, Wang Y, Liu N and Li F Beyond Outlier Detection: Outlier Interpretation by Attention-Guided Triplet Deviation Network Proceedings of the Web Conference 2021, (1328-1339)
  8. ACM
    Pang G, Cao L and Aggarwal C Deep Learning for Anomaly Detection: Challenges, Methods, and Opportunities Proceedings of the 14th ACM International Conference on Web Search and Data Mining, (1127-1130)
  9. ACM
    Pang G and Cao L (2020). Heterogeneous Univariate Outlier Ensembles in Multidimensional Data, ACM Transactions on Knowledge Discovery from Data, 14:6, (1-27), Online publication date: 31-Dec-2021.
  10. ACM
    Wang T, Duan L, Dong G and Bao Z (2020). Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence Data, ACM Transactions on Knowledge Discovery from Data, 14:5, (1-26), Online publication date: 31-Oct-2020.
  11. ACM
    Visengeriyeva L and Abedjan Z (2020). Anatomy of Metadata for Data Curation, Journal of Data and Information Quality, 12:3, (1-30), Online publication date: 30-Sep-2020.
  12. Ribeiro R and Moniz N (2020). Imbalanced regression and extreme value prediction, Machine Language, 109:9-10, (1803-1835), Online publication date: 1-Sep-2020.
  13. ACM
    Zhang H, Zheng W, Chen C, Gao K, Hu Y, Huang L and Xu W Modeling Heterogeneous Statistical Patterns in High-dimensional Data by Adversarial Distributions: An Unsupervised Generative Framework Proceedings of The Web Conference 2020, (1389-1399)
  14. ACM
    Angiulli F (2020). CFOF, ACM Transactions on Knowledge Discovery from Data, 14:1, (1-53), Online publication date: 29-Feb-2020.
  15. ACM
    Thudumu S, Branch P, Jin J and Singh J Estimation of Locally Relevant Subspace in High-dimensional Data Proceedings of the Australasian Computer Science Week Multiconference, (1-6)
  16. Benmakrelouf S, St-Onge C, Kara N, Tout H, Edstrom C and Lemieux Y (2022). Abnormal behavior detection using resource level to service level metrics mapping in virtualized systems, Future Generation Computer Systems, 102:C, (680-700), Online publication date: 1-Jan-2020.
  17. Gopalan P, Sharan V and Wieder U PIDForest Proceedings of the 33rd International Conference on Neural Information Processing Systems, (15809-15819)
  18. Wei Z, Pei Q, Liu X and Ma L Efficient Privacy Preserving Cross-Datasets Collaborative Outlier Detection Cyberspace Safety and Security, (343-356)
  19. ACM
    Trittenbach H and Böhm K One-Class Active Learning for Outlier Detection with Multiple Subspaces Proceedings of the 28th ACM International Conference on Information and Knowledge Management, (811-820)
  20. Wilmet A, Viard T, Latapy M and Lamarche-Perrin R (2022). Outlier detection in IP traffic modelled as a link stream using the stability of degree distributions over time, Computer Networks: The International Journal of Computer and Telecommunications Networking, 161:C, (197-209), Online publication date: 9-Oct-2019.
  21. Khan S, Liew C, Yairi T and McWilliam R (2022). Unsupervised anomaly detection in unmanned aerial vehicles, Applied Soft Computing, 83:C, Online publication date: 1-Oct-2019.
  22. Kieu T, Yang B, Guo C and Jensen C Outlier detection for time series with recurrent autoencoder ensembles Proceedings of the 28th International Joint Conference on Artificial Intelligence, (2725-2732)
  23. Mamun M, Berger C and Hansson J (2019). Effects of measurements on correlations of software code metrics, Empirical Software Engineering, 24:4, (2764-2818), Online publication date: 1-Aug-2019.
  24. ACM
    Ilyas I and Chu X (2019). Data Cleaning, 10.1145/3310205, Online publication date: 9-Jul-2019.
  25. Sun A, Zhong Z, Jeong H and Yang Q (2019). Building complex event processing capability for intelligent environmental monitoring, Environmental Modelling & Software, 116:C, (1-6), Online publication date: 1-Jun-2019.
  26. Yahalom R, Steren A, Nameri Y, Roytman M, Porgador A and Elovici Y (2019). Improving the effectiveness of intrusion detection systems for hierarchical data, Knowledge-Based Systems, 168:C, (59-69), Online publication date: 15-Mar-2019.
  27. ACM
    Calikus E, Fan Y, Nowaczyk S and Sant'Anna A Interactive-COSMO Proceedings of the Workshop on Interactive Data Mining, (1-9)
  28. Xu H, Wang Y, Wu Z and Wang Y Embedding-based complex feature value coupling learning for detecting outliers in non-IID categorical data Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, (5541-5548)
  29. Duraj A, Niewiadomski A and Szczepaniak P (2018). Detection of outlier information by the use of linguistic summaries based on classic and interval‐valued fuzzy sets, International Journal of Intelligent Systems, 34:3, (415-438), Online publication date: 21-Jan-2019.
  30. ACM
    Liu D, Cui W, Jin K, Guo Y and Qu H (2018). DeepTracker, ACM Transactions on Intelligent Systems and Technology, 10:1, (1-25), Online publication date: 16-Jan-2019.
  31. ACM
    Abuzaid F, Bailis P, Ding J, Gan E, Madden S, Narayanan D, Rong K and Suri S (2018). MacroBase, ACM Transactions on Database Systems, 43:4, (1-45), Online publication date: 16-Dec-2018.
  32. Tatbul N, Lee T, Zdonik S, Alam M and Gottschlich J Precision and recall for time series Proceedings of the 32nd International Conference on Neural Information Processing Systems, (1924-1934)
  33. ACM
    Xu H, Wang Y, Cheng L, Wang Y and Ma X Exploring a High-quality Outlying Feature Value Set for Noise-Resilient Outlier Detection in Categorical Data Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (17-26)
  34. ACM
    Abdallah Z, Gaber M, Srinivasan B and Krishnaswamy S (2018). Activity Recognition with Evolving Data Streams, ACM Computing Surveys, 51:4, (1-36), Online publication date: 6-Sep-2018.
  35. ACM
    Manzoor E, Lamba H and Akoglu L xStream Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (1963-1972)
  36. ACM
    Yu W, Cheng W, Aggarwal C, Zhang K, Chen H and Wang W NetWalk Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2672-2681)
  37. ACM
    Salehi M and Rashidi L (2018). A Survey on Anomaly detection in Evolving Data, ACM SIGKDD Explorations Newsletter, 20:1, (13-23), Online publication date: 29-May-2018.
  38. He J and Xiong N (2018). An effective information detection method for social big data, Multimedia Tools and Applications, 77:9, (11277-11305), Online publication date: 1-May-2018.
  39. Ahmed M (2018). Reservoir-based network traffic stream summarization for anomaly detection, Pattern Analysis & Applications, 21:2, (579-599), Online publication date: 1-May-2018.
  40. Pang G, Cao L, Chen L, Lian D and Liu H Sparse modeling-based sequential ensemble learning for effective outlier detection in high-dimensional numeric data Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, (3892-3899)
  41. Benkabou S, Benabdeslem K and Canitia B (2018). Unsupervised outlier detection for time series by entropy and dynamic time warping, Knowledge and Information Systems, 54:2, (463-486), Online publication date: 1-Feb-2018.
  42. Szczepaniak P, Duraj A and Gil D (2018). Case-Based Reasoning, Complexity, 2018, Online publication date: 1-Jan-2018.
  43. Forestiero A (2017). Bio-inspired algorithm for outliers detection, Multimedia Tools and Applications, 76:24, (25659-25677), Online publication date: 1-Dec-2017.
  44. Manco G, Ritacco E, Rullo P, Gallucci L, Astill W, Kimber D and Antonelli M (2017). Fault detection and explanation through big data analysis on sensor streams, Expert Systems with Applications: An International Journal, 87:C, (141-156), Online publication date: 30-Nov-2017.
  45. ACM
    Zhu M, Aggarwal C, Ma S, Zhang H and Huai J Outlier Detection in Sparse Data with Factorization Machines Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (817-826)
  46. Kulczycki P and Kruszewski D (2017). Identification of atypical elements by transforming task to supervised form with fuzzy and intuitionistic fuzzy evaluations, Applied Soft Computing, 60:C, (623-633), Online publication date: 1-Nov-2017.
  47. Nadeem F, Alghazzawi D, Mashat A, Fakeeh K, Almalaise A and Hagras H (2017). Modeling and predicting execution time of scientific workflows in the Grid using radial basis function neural network, Cluster Computing, 20:3, (2805-2819), Online publication date: 1-Sep-2017.
  48. Xu Z, Kersting K and Ritter L Stochastic online anomaly analysis for streaming time series Proceedings of the 26th International Joint Conference on Artificial Intelligence, (3189-3195)
  49. ACM
    Fu Y, Aggarwal C, Parthasarathy S, Turaga D and Xiong H REMIX Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (827-835)
  50. Demiralp Ç, Haas P, Parthasarathy S and Pedapati T (2017). Foresight, Proceedings of the VLDB Endowment, 10:12, (1937-1940), Online publication date: 1-Aug-2017.
  51. Altimira D, Mueller F, Clarke J, Lee G, Billinghurst M and Bartneck C (2017). Enhancing player engagement through game balancing in digitally augmented physical games, International Journal of Human-Computer Studies, 103:C, (35-47), Online publication date: 1-Jul-2017.
  52. ACM
    Jankov D, Sikdar S, Mukherjee R, Teymourian K and Jermaine C Real-time High Performance Anomaly Detection over Data Streams Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, (292-297)
  53. ACM
    Bailis P, Gan E, Madden S, Narayanan D, Rong K and Suri S MacroBase Proceedings of the 2017 ACM International Conference on Management of Data, (541-556)
  54. Theissler A (2017). Detecting known and unknown faults in automotive systems using ensemble-based anomaly detection, Knowledge-Based Systems, 123:C, (163-173), Online publication date: 1-May-2017.
  55. Rathore S and Kumar S (2017). A decision tree logic based recommendation system to select software fault prediction techniques, Computing, 99:3, (255-285), Online publication date: 1-Mar-2017.
  56. Conforti R, Rosa M and Hofstede A (2017). Filtering Out Infrequent Behavior from Business Process Event Logs, IEEE Transactions on Knowledge and Data Engineering, 29:2, (300-314), Online publication date: 1-Feb-2017.
  57. ACM
    Sharma M, Sarcar S, Sheet D and Biswas P Limitations with measuring performance of techniques for abnormality localization in surveillance video and how to overcome them? Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing, (1-8)
  58. Forestiero A (2016). Self-organizing anomaly detection in data streams, Information Sciences: an International Journal, 373:C, (321-336), Online publication date: 10-Dec-2016.
  59. ACM
    Alrwais S, Yuan K, Alowaisheq E, Liao X, Oprea A, Wang X and Li Z Catching predators at watering holes Proceedings of the 32nd Annual Conference on Computer Security Applications, (153-166)
  60. Salehi M, Zhang X, Bezdek J and Leckie C Smart Sampling: A Novel Unsupervised Boosting Approach for Outlier Detection AI 2016: Advances in Artificial Intelligence, (469-481)
  61. Liang P and Wongthanavasu S (2016). Hybrid linear matrix factorization for topic-coherent terms clustering, Expert Systems with Applications: An International Journal, 62:C, (358-372), Online publication date: 15-Nov-2016.
  62. ACM
    Ray S and Wright A Detecting Anomalies in Alert Firing within Clinical Decision Support Systems using Anomaly/Outlier Detection Techniques Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, (185-190)
  63. Wessel M, Thies F and Benlian A (2016). The emergence and effects of fake social information, Decision Support Systems, 90:C, (75-85), Online publication date: 1-Oct-2016.
  64. ACM
    He S, Tan J and Chan S Towards area classification for large-scale fingerprint-based system Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, (232-243)
  65. Chu X and Ilyas I (2016). Qualitative data cleaning, Proceedings of the VLDB Endowment, 9:13, (1605-1608), Online publication date: 1-Sep-2016.
  66. Schneider M, Ertel W and Palm G Constant time expected similarity estimation for large-scale anomaly detection Proceedings of the Twenty-second European Conference on Artificial Intelligence, (12-20)
  67. Shen Y, Liu H, Wang Y, Chen Z and Sun G A novel isolation-based outlier detection method Proceedings of the 14th Pacific Rim International Conference on Trends in Artificial Intelligence, (446-456)
  68. ACM
    Sharma M, Sheet D and Biswas P Abnormality Detecting Deep Belief Network Proceedings of the International Conference on Advances in Information Communication Technology & Computing, (1-6)
  69. ACM
    Schroeder J, Berger C, Staron M, Herpel T and Knauss A Unveiling anomalies and their impact on software quality in model-based automotive software revisions with software metrics and domain experts Proceedings of the 25th International Symposium on Software Testing and Analysis, (154-164)
  70. Martin S and Quach T Interactive Visualization of Multivariate Time Series Data Proceedings, Part II, of the 10th International Conference on Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience - Volume 9744, (322-332)
  71. Campos G, Zimek A, Sander J, Campello R, Micenková B, Schubert E, Assent I and Houle M (2016). On the evaluation of unsupervised outlier detection, Data Mining and Knowledge Discovery, 30:4, (891-927), Online publication date: 1-Jul-2016.
  72. ACM
    Chu X, Ilyas I, Krishnan S and Wang J Data Cleaning Proceedings of the 2016 International Conference on Management of Data, (2201-2206)
  73. Bindu P and Thilagam P (2016). Mining social networks for anomalies, Journal of Network and Computer Applications, 68:C, (213-229), Online publication date: 1-Jun-2016.
  74. ACM
    Altimira D, Mueller F, Clarke J, Lee G, Billinghurst M and Bartneck C Digitally Augmenting Sports Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, (1681-1691)
  75. ACM
    Angiulli F and Fassetti F (2016). Toward Generalizing the Unification with Statistical Outliers, ACM Transactions on Knowledge Discovery from Data, 10:3, (1-26), Online publication date: 24-Feb-2016.
  76. Dutta J, Banerjee B and Reddy C (2016). RODS: Rarity based Outlier Detection in a Sparse Coding Framework, IEEE Transactions on Knowledge and Data Engineering, 28:2, (483-495), Online publication date: 1-Feb-2016.
  77. Faria E, Gonçalves I, Carvalho A and Gama J (2016). Novelty detection in data streams, Artificial Intelligence Review, 45:2, (235-269), Online publication date: 1-Feb-2016.
  78. Huang H and Kasiviswanathan S (2015). Streaming anomaly detection using randomized matrix sketching, Proceedings of the VLDB Endowment, 9:3, (192-203), Online publication date: 1-Nov-2015.
  79. ACM
    Bandyopadhyay S, Ukil A, Puri C, Pal A, Singh R and Bose T Demo Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, (469-470)
  80. ACM
    Jagadeesan L, Mc Bride A, Gurbani V and Yang J Cognitive Security Proceedings of the Principles, Systems and Applications on IP Telecommunications, (43-50)
  81. ACM
    Aggarwal C and Sathe S (2015). Theoretical Foundations and Algorithms for Outlier Ensembles, ACM SIGKDD Explorations Newsletter, 17:1, (24-47), Online publication date: 29-Sep-2015.
  82. Neuvirth H, Finkelstein Y, Hilbuch A, Nahum S, Alon D and Yom-Tov E Early detection of fraud storms in the cloud Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III, (53-67)
  83. ACM
    Dalmia A, Gupta M and Varma V Query-based Graph Cuboid Outlier Detection Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, (705-712)
  84. ACM
    Laptev N, Amizadeh S and Flint I Generic and Scalable Framework for Automated Time-series Anomaly Detection Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (1939-1947)
  85. Ranshous S, Shen S, Koutra D, Harenberg S, Faloutsos C and Samatova N (2015). Anomaly detection in dynamic networks, WIREs Computational Statistics, 7:3, (223-247), Online publication date: 1-May-2015.
  86. ACM
    Thoring K, Mueller R and Badke-Schaub P Ethnographic Design Research With Wearable Cameras Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, (2049-2054)
  87. Krell M and Wöhrle H (2015). New one-class classifiers based on the origin separation approach, Pattern Recognition Letters, 53:C, (93-99), Online publication date: 1-Feb-2015.
  88. ACM
    Pham T, Nguyen Q and Nguyen X Generating artificial attack data for intrusion detection using machine learning Proceedings of the 5th Symposium on Information and Communication Technology, (286-291)
  89. ACM
    Tang G, Wu K, Pei J, Tang J and Lei J An Appliance-Driven Approach to Detection of Corrupted Load Curve Data Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, (1429-1438)
  90. ACM
    Günnemann S, Günnemann N and Faloutsos C Detecting anomalies in dynamic rating data Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, (841-850)
  91. ACM
    Sánchez P, Müller E, Irmler O and Böhm K Local context selection for outlier ranking in graphs with multiple numeric node attributes Proceedings of the 26th International Conference on Scientific and Statistical Database Management, (1-12)
  92. Cárdenas-Montes M Depth-Based Outlier Detection Algorithm Proceedings of the 9th International Conference on Hybrid Artificial Intelligence Systems - Volume 8480, (122-132)
  93. ACM
    Günnemann N, Günnemann S and Faloutsos C Robust multivariate autoregression for anomaly detection in dynamic product ratings Proceedings of the 23rd international conference on World wide web, (361-372)
  94. ACM
    Zimek A, Campello R and Sander J (2014). Ensembles for unsupervised outlier detection, ACM SIGKDD Explorations Newsletter, 15:1, (11-22), Online publication date: 17-Mar-2014.
  95. Sugiyama M and Borgwardt K Rapid distance-based outlier detection via sampling Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, (467-475)
  96. ACM
    Keller F, Müller E, Wixler A and Böhm K Flexible and adaptive subspace search for outlier analysis Proceedings of the 22nd ACM international conference on Information & Knowledge Management, (1381-1390)
  97. Pei J Some New Progress in Analyzing and Mining Uncertain and Probabilistic Data for Big Data Analytics Proceedings of the 14th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume 8170, (38-45)
  98. ACM
    Aggarwal C (2013). Outlier ensembles, ACM SIGKDD Explorations Newsletter, 14:2, (49-58), Online publication date: 30-Apr-2013.
  99. ACM
    Li Z, Sun C, Liu C, Chen X, Wang M and Liu Y Dual-MGAN: An Efficient Approach for Semi-supervised Outlier Detection with Few Identified Anomalies, ACM Transactions on Knowledge Discovery from Data, 0:0
  100. Li T, Chen L and Chen C Fuzzy clustering based traffic pattern identification 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1181-1187)
  101. Bezerra C, Costa B, Guedes L and Angelov P A comparative study of autonomous learning outlier detection methods applied to fault detection 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1-7)
  102. Mukherjee G, Bhanot G, Raines K, Sastry S, Doniach S and Biehl M Predicting recurrence in clear cell Renal Cell Carcinoma: Analysis of TCGA data using outlier analysis and generalized matrix LVQ 2016 IEEE Congress on Evolutionary Computation (CEC), (656-661)
  103. Leitner L, Lagrange A and Endisch C End-of-line fault detection for combustion engines using one-class classification 2016 IEEE International Conference on Advanced Intelligent Mechatronics (AIM), (207-213)
Contributors
  • IBM Thomas J. Watson Research Center

Reviews

Fernando Berzal

An outlier is “an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism” [1]. In data mining, they are usually called anomalies, although they are also referred to as abnormalities, discordants, or deviants in statistics. Their detection is especially difficult since they are often hard to distinguish from noise. Actually, some authors classify outliers into weak and strong outliers just to stress the difference between noise, the former, and anomalies, the latter. While noise identification and removal is an important problem in many application domains, such as signal processing, data mining efforts are typically focused on anomaly detection. Since their differentiation is merely semantic, many of the techniques proposed for one problem can be used for solving the other. The different approaches that have been used for solving the problem from different perspectives have led to a wide gamut of techniques, which do not always employ a consistent terminology since they do not share the same background. Fortunately, Aggarwal has written a thorough monograph that surveys the broad field of outlier analysis (or anomaly detection, if you prefer). His volume seamlessly integrates the traditional content from statistics textbooks and the latest developments in data mining, providing a balanced review of the many algorithms that have been proposed in the literature. After the usual introductory chapter, which provides a bird's-eye view of the field, the first half of the book covers the different techniques and models that have been devised for the detection of outliers. Starting with extreme value analysis, the branch of statistics that deals with extreme deviations from the median of probability distributions, the author delves into probabilistic models whose parameters can be learned by expectation maximization (EM) algorithms. Linear models, which analyze linear correlations, are then addressed, including linear regression and principal component analysis (PCA). Later, proximity-based outlier detection techniques are analyzed. These encompass both distance-based methods, such as those based on nearest neighbors, and density-based methods, whose origin can be traced to the density-based clustering techniques often used in data mining. In fact, outliers can be viewed as the byproduct of unsupervised clustering techniques: anomalies (or noise) are what is not included in the identified clusters. Subspace clustering is another common data mining technique, designed for dealing with high-dimensional data, yet anomaly detection poses some specific challenges and requires something more than the blind adaptation of existing clustering techniques. Of course, Aggarwal delves into all the necessary details and describes some subspace outlier detection techniques. His in-depth survey of outlier detection techniques ends with a chapter on supervised outlier detection. From the perspective of supervised machine learning, outlier detection is just a classification problem, yet a highly unbalanced one with its own nuances in practice. The survey of outlier detection models, described in terms of multidimensional numerical data in the first half of the book, is complemented by a review of anomaly detection techniques for different data types in the second half. Aggarwal devotes chapters to categorical, text, and mixed attribute data, as well as different situations where data values have dependencies and hence cannot be treated independently from one another. These situations range from time series and data streams, spatial and spatiotemporal data, to discrete sequences, graphs, and networks. Specific techniques for each of them are treated and a wealth of references is provided in the author's insightful bibliographic comments at the end of each chapter. With the thoroughness that characterizes its surveys of techniques and adaptations to different data types, the book ends with a chapter on the applications of outlier analysis. Many application domains are mentioned, from quality control and fault detection to fraud detection and intrusion detection systems. The author provides a brief description of specific problems in each application domain, with discussions of how the different techniques covered in the previous chapters can be used to solve them in practice. His abundant bibliographic references can serve as a good starting point for those interested in particular applications. Aggarwal has written a complete survey of the state of the art in anomaly detection. His writing style is not as dry as you might expect from a thorough academic survey of a similar scope, the details behind different outlier detection techniques are clearly explained, and his comments when comparing and contrasting different approaches are often insightful. His book provides a solid frame of reference for those interested in anomaly detection, both researchers and practitioners, no matter whether they are generalists or they are mostly focused on particular applications. All of them can benefit from the broad overview of the field, the nice introductions to many different techniques, and the annotated pointers for further reading that this book provides. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations