[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Skip header Section
ClusteringOctober 2009
Publisher:
  • Wiley-IEEE Press
ISBN:978-0-470-27680-8
Published:24 October 2009
Pages:
358
Skip Bibliometrics Section
Reflects downloads up to 13 Dec 2024Bibliometrics
Skip Abstract Section
Abstract

This is the first book to take a truly comprehensive look at clustering. It begins with an introduction to cluster analysis and goes on to explore: proximity measures; hierarchical clustering; partition clustering; neural network-based clustering; kernel-based clustering; sequential data clustering; large-scale data clustering; data visualization and high-dimensional data clustering; and cluster validation. The authors assume no previous background in clustering and their generous inclusion of examples and references help make the subject matter comprehensible for readers of varying levels and backgrounds.

Cited By

  1. Dorman K and Maitra R (2021). An efficient k‐modes algorithm for clustering categorical datasets, Statistical Analysis and Data Mining, 15:1, (83-97), Online publication date: 10-Jan-2022.
  2. ACM
    He J, Li L and Wang X i-tStar: Interactive Trajectory Star Coordinates 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture, (2044-2047)
  3. Oliveira J, Rios R, de Almeida E, Sant'Anna C and Rios T Fuzzy Software Analyzer (FSA): A New Approach for Interpreting Source Code Versioning Repositories 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1-6)
  4. Maciel T and Emmendorfer L Cognitive Consistency Models Applied to Data Clustering Artificial Intelligence and Soft Computing, (183-191)
  5. Godois L, Adamatti D and Emmendorfer L (2020). A multi-agent-based algorithm for data clustering, Progress in Artificial Intelligence, 9:4, (305-313), Online publication date: 1-Dec-2020.
  6. Guidotti A and Vanelli‐Coralli A (2019). Clustering strategies for multicast precoding in multibeam satellite systems, International Journal of Satellite Communications and Networking, 38:2, (85-104), Online publication date: 17-Feb-2020.
  7. Martarelli N and Nagano M Optimization of the Numeric and Categorical Attribute Weights in KAMILA Mixed Data Clustering Algorithm Intelligent Data Engineering and Automated Learning – IDEAL 2019, (20-27)
  8. Berry N and Maitra R (2019). TiK‐means, Statistical Analysis and Data Mining, 12:3, (223-233), Online publication date: 20-May-2019.
  9. Lithio A and Maitra R (2018). An efficient k‐means‐type algorithm for clustering datasets with incomplete records, Statistical Analysis and Data Mining, 11:6, (296-311), Online publication date: 16-Nov-2018.
  10. Sun J and Masouros C Drone Positioning for User Coverage Maximization 2018 IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), (318-322)
  11. Silva D, Giusti R, Keogh E and Batista G (2018). Speeding up similarity search under dynamic time warping by pruning unpromising alignments, Data Mining and Knowledge Discovery, 32:4, (988-1016), Online publication date: 1-Jul-2018.
  12. Durn-Rosal A, Gutirrez P, Martnez-Estudillo F and Hrvas-Martnez C (2018). Simultaneous optimisation of clustering quality and approximation error for time series segmentation, Information Sciences: an International Journal, 442:C, (186-201), Online publication date: 1-May-2018.
  13. Bijari K, Zare H, Veisi H and Bobarshad H (2018). Memory-enriched big bang---big crunch optimization algorithm for data clustering, Neural Computing and Applications, 29:6, (111-121), Online publication date: 1-Mar-2018.
  14. Fan J and Wang J (2018). A Two-Phase Fuzzy Clustering Algorithm Based on Neurodynamic Optimization With Its Application for PolSAR Image Segmentation, IEEE Transactions on Fuzzy Systems, 26:1, (72-83), Online publication date: 1-Feb-2018.
  15. Durán-Rosal A, Paz-Marín M, Gutiérrez P and Hervás-Martínez C (2017). Identifying Market Behaviours Using European Stock Index Time Series by a Hybrid Segmentation Algorithm, Neural Processing Letters, 46:3, (767-790), Online publication date: 1-Dec-2017.
  16. (2017). Robust semi-supervised clustering with polyhedral and circular uncertainty, Neurocomputing, 265:C, (4-27), Online publication date: 22-Nov-2017.
  17. Jiang W and Yin Z (2017). Indoor localization with a signal tree, Multimedia Tools and Applications, 76:19, (20317-20339), Online publication date: 1-Oct-2017.
  18. Ping Y, Tian Y, Guo C, Wang B and Yang Y (2017). FRSVC, Pattern Recognition, 69:C, (286-298), Online publication date: 1-Sep-2017.
  19. źMieja M and Wiercioch M (2017). Constrained clustering with a complex cluster structure, Advances in Data Analysis and Classification, 11:3, (493-518), Online publication date: 1-Sep-2017.
  20. Cao F, Yu L, Huang J and Liang J (2017). k-mw-modes, Applied Soft Computing, 57:C, (605-614), Online publication date: 1-Aug-2017.
  21. Cruz N, Nedjah N and de Macedo Mourelle L (2017). Robust distributed spatial clustering for swarm robotic based systems, Applied Soft Computing, 57:C, (727-737), Online publication date: 1-Aug-2017.
  22. Benmammar B, Taleb M and Krief F (2017). Diffusing-CRN k-means, Wireless Networks, 23:6, (1849-1861), Online publication date: 1-Aug-2017.
  23. Cena A and Gagolewski M OWA-based linkage and the genie correction for hierarchical clustering 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1-6)
  24. Libal U and Hasiewicz Z (2017). Risk upper bound for a NM-type multiresolution classification scheme of random signals by Daubechies wavelets, Engineering Applications of Artificial Intelligence, 62:C, (109-123), Online publication date: 1-Jun-2017.
  25. Spurek P (2017). General split gaussian Cross-Entropy clustering, Expert Systems with Applications: An International Journal, 68:C, (58-68), Online publication date: 1-Feb-2017.
  26. Campo D, Stegmayer G and Milone D (2016). A new index for clustering validation with overlapped clusters, Expert Systems with Applications: An International Journal, 64:C, (549-556), Online publication date: 1-Dec-2016.
  27. Martí L, García J, Berlanga A and Molina J (2016). MONEDA, Journal of Global Optimization, 66:4, (729-768), Online publication date: 1-Dec-2016.
  28. (2016). A population initialization method for evolutionary algorithms based on clustering and Cauchy deviates, Expert Systems with Applications: An International Journal, 60:C, (294-310), Online publication date: 30-Oct-2016.
  29. Amiri M, Bakhshandeh Amnieh H, Hasanipanah M and Mohammad Khanli L (2016). A new combination of artificial neural network and K-nearest neighbors models to predict blast-induced ground vibration and air-overpressure, Engineering with Computers, 32:4, (631-644), Online publication date: 1-Oct-2016.
  30. Yang Y, Chen D, Gu R, Gu Y and Yu S (2016). Consumers’ Kansei Needs Clustering Method for Product Emotional Design Based on Numerical Design Structure Matrix and Genetic Algorithms, Computational Intelligence and Neuroscience, 2016, (12), Online publication date: 1-Aug-2016.
  31. Covões T, Hruschka E and Ghosh J (2016). Evolving gaussian mixture models with splitting and merging mutation operators, Evolutionary Computation, 24:2, (293-317), Online publication date: 1-Jun-2016.
  32. ACM
    Tomasini C, Emmendorfer L, Borges E and Machado K A methodology for selecting the most suitable cluster validation internal indices Proceedings of the 31st Annual ACM Symposium on Applied Computing, (901-903)
  33. ACM
    Campello R, Moulavi D, Zimek A and Sander J (2015). Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection, ACM Transactions on Knowledge Discovery from Data, 10:1, (1-51), Online publication date: 27-Jul-2015.
  34. Aielli G and Caporin M (2014). Variance clustering improved dynamic conditional correlation MGARCH estimators, Computational Statistics & Data Analysis, 76:C, (556-576), Online publication date: 1-Aug-2014.
  35. Hadian A and Shahrivari S (2014). High performance parallel $$k$$k-means clustering for disk-resident datasets on multi-core CPUs, The Journal of Supercomputing, 69:2, (845-863), Online publication date: 1-Aug-2014.
  36. Barros R, Jaskowiak P, Cerri R and de Carvalho A (2014). A framework for bottom-up induction of oblique decision trees, Neurocomputing, 135:C, (3-12), Online publication date: 5-Jul-2014.
  37. Cruz-Ramírez M, Paz-Marín M, Pérez-Ortiz M and Hervás-Martínez C Time Series Segmentation and Statistical Characterisation of the Spanish Stock Market Ibex-35 Index Proceedings of the 9th International Conference on Hybrid Artificial Intelligence Systems - Volume 8480, (74-85)
  38. Pérez-Ortiz M, Gutiérrez P, Sánchez-Monedero J, Hervás-Martínez C, Nikolaou A, Dicaire I and Fernández-Navarro F Time Series Segmentation of Paleoclimate Tipping Points by an Evolutionary Algorithm Proceedings of the 9th International Conference on Hybrid Artificial Intelligence Systems - Volume 8480, (318-329)
  39. Kulczycki P and Lukasik S (2014). An algorithm for reducing the dimension and size of a sample for data exploration procedures, International Journal of Applied Mathematics and Computer Science, 24:1, (133-149), Online publication date: 1-Mar-2014.
  40. Cheng T, Li P, Zhu S and Torrieri D (2014). M-cluster and X-ray, Integrated Computer-Aided Engineering, 21:1, (19-34), Online publication date: 1-Jan-2014.
  41. Padua R, Santos F, Silva Conrado M, Carvalho V and Rezende S Subjective Evaluation of Labeling Methods for Association Rule Clustering Proceedings of the 12th Mexican International Conference on Advances in Soft Computing and Its Applications - Volume 8266, (289-300)
  42. Ríos S and Silva R (2013). A new dissimilarity measure for online social networks moderation, Web Intelligence and Agent Systems, 11:4, (351-364), Online publication date: 1-Oct-2013.
  43. ACM
    Silva J, Faria E, Barros R, Hruschka E, Carvalho A and Gama J (2013). Data stream clustering, ACM Computing Surveys, 46:1, (1-31), Online publication date: 1-Oct-2013.
  44. Carvalho V, Santos F and Rezende S Metrics to Support the Evaluation of Association Rule Clustering Proceedings of the 15th International Conference on Data Warehousing and Knowledge Discovery - Volume 8057, (248-259)
  45. Aielli G and Caporin M (2013). Original article, Mathematics and Computers in Simulation, 94, (205-222), Online publication date: 1-Aug-2013.
  46. Jaskowiak P, Campello R and Costa Filho I (2013). Proximity Measures for Clustering Gene Expression Microarray Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10:4, (845-857), Online publication date: 1-Jul-2013.
  47. ACM
    Nikolov V and Naydenov D Multifactor modelling system with cloud based pre-processing Proceedings of the 14th International Conference on Computer Systems and Technologies, (239-246)
  48. Yazdani H and Kwasnicka H Fuzzy classification method in credit risk Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I, (495-504)
  49. de Amorim R An empirical evaluation of different initializations on the number of k-means iterations Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I, (15-26)
  50. Almeida S, Coelho F, Guimarães F and Braga A A general approach for adaptive kernels in semi-supervised clustering Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning, (508-515)
  51. Stegmayer G, Milone D, Kamenetzky L, Lopez M and Carrari F (2012). A Biologically Inspired Validity Measure for Comparison of Clustering Methods over Metabolic Data Sets, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9:3, (706-716), Online publication date: 1-May-2012.
  52. ACM
    Ríos S, Silva R and Aguilera F A dissimilarity measure for automate moderation in online social networks Proceedings of the 4th International Workshop on Web Intelligence & Communities, (1-9)
  53. Gkoulalas-Divanis A and Loukides G (2012). Utility-guided Clustering-based Transaction Data Anonymization, Transactions on Data Privacy, 5:1, (223-251), Online publication date: 1-Apr-2012.
  54. Łukasik S and Kulczycki P An algorithm for sample and data dimensionality reduction using fast simulated annealing Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I, (152-161)
  55. Rojas D, Zambrano C, Varas M and Urrutia A A multi-level thresholding-based method to learn fuzzy membership functions from data warehouse Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, (664-674)
  56. Bravo C and Weber R Semi-supervised constrained clustering with cluster outlier filtering Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, (347-354)
  57. Phillips J, Raman P and Venkatasubramanian S Generating a diverse set of high-quality clusterings Proceedings of the 2nd International Conference on Discovering, Summarizing and Using Multiple Clusterings - Volume 772, (80-91)
  58. Clémençon S, Gaudel R and Jakubowicz J Clustering rankings in the fourier domain Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I, (343-358)
  59. ACM
    Gkoulalas-Divanis A and Loukides G PCTA Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society, (1-10)
  60. Zhang K, Orgun M, Zhao Y and Nayak A The discovery of hierarchical cluster structures assisted by a visualization technique Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I, (703-711)
  61. Vega-Pons S and Ruiz-Shulcloper J Partition selection approach for hierarchical clustering based on clustering ensemble Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications, (525-532)
  62. Espinosa-Isidrón D and García-Reyes E A new dissimilarity measure for trajectories with applications in anomaly detection Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications, (193-201)
  63. Anderson D, Bezdek J, Popescu M and Keller J (2010). Comparing fuzzy, probabilistic, and possibilistic partitions, IEEE Transactions on Fuzzy Systems, 18:5, (906-918), Online publication date: 1-Oct-2010.
  64. Keuper M, Bensch R, Voigt K, Dovzhenko A, Palme K, Burkhardt H and Ronneberger O Semi-supervised learning of edge filters for volumetric image segmentation Proceedings of the 32nd DAGM conference on Pattern recognition, (462-471)
  65. Prada M, Domínguez M, Díaz I, Fuertes J, Reguera P, Morán A and Alonso S Application of SOM-based visualization maps for time-response analysis of industrial processes Proceedings of the 20th international conference on Artificial neural networks: Part II, (392-401)
  66. Vera P, Estévez P and Principe J Linear projection method based on information theoretic learning Proceedings of the 20th international conference on Artificial neural networks: Part III, (178-187)
  67. Tscherepanow M TopoART Proceedings of the 20th international conference on Artificial neural networks: Part III, (157-167)
  68. Costa J Clustering and visualizing SOM results Proceedings of the 11th international conference on Intelligent data engineering and automated learning, (334-343)
  69. ACM
    Nikolov V Optimizations in time series clustering and prediction Proceedings of the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing on International Conference on Computer Systems and Technologies, (528-533)
  70. Oszust M and Wysocki M Determining subunits for sign language recognition by evolutionary cluster-based segmentation of time series Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II, (189-196)
  71. Namkoong Y, Joo Y and Dankel D Feature subset-wise mixture model-based clustering via local search algorithm Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence, (135-146)
  72. Pereira J, Schmidt F, Contreras P, Murtagh F and Astudillo H Clustering and semantics preservation in cultural heritage information spaces Adaptivity, Personalization and Fusion of Heterogeneous Information, (100-105)
  73. Wunsch D ART properties of interest in engineering applications Proceedings of the 2009 international joint conference on Neural Networks, (3556-3559)
  74. Osoba O, Mitaim S and Kosko B Adaptive fuzzy priors for Bayesian inference Proceedings of the 2009 international joint conference on Neural Networks, (3283-3290)
  75. Xu R, du Plessis L, Damelin S, Sears M and Wunsch D Analysis of hyperspectral data with diffusion maps and fuzzy ART Proceedings of the 2009 international joint conference on Neural Networks, (2302-2309)
  76. Guimarães F, Wanner E and Takahashi R A quality metric for multi-objective optimization based on hierarchical clustering techniques Proceedings of the Eleventh conference on Congress on Evolutionary Computation, (3292-3299)
  77. Suresh K, Kundu D, Ghosh S, Das S and Abraham A Automatic clustering with multi-objective differential evolution algorithms Proceedings of the Eleventh conference on Congress on Evolutionary Computation, (2590-2597)
  78. Das S, Chowdhury A and Abraham A A bacterial evolutionary algorithm for automatic data clustering Proceedings of the Eleventh conference on Congress on Evolutionary Computation, (2403-2410)
  79. He H, Tan Y and Fujimoto K Estimation of optimal cluster number for fuzzy clustering with combined fuzzy entropy index 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (697-703)
Contributors
  • GE Global Research
  • Missouri University of Science and Technology

Reviews

Aris Gkoulalas-Divanis

Clustering algorithms are important to a wide spectrum of scientific disciplines, spanning from computer science (CS) and engineering to medical, earth, and social sciences. Applications of clustering are numerous: speech recognition, organization of document collections, disease diagnosis and treatment, star and planet classification, analysis of social networks, and criminal physiology, to name just a few. As a result, it is not surprising that more than 12,000 scientific papers related to clustering have been published since 1996. This book provides a comprehensive and thorough presentation of this research area, describing some of the most important clustering algorithms proposed in research literature. The book is organized into 11 chapters that highlight the various aspects of the clustering process. Chapter 1 is a brief introduction that discusses cluster analysis in general, defines the notion of clusters, and presents some interesting clustering applications. In the second chapter, Xu and Wunsch shed light on the different proximity measures that have been established to quantify the similarity between data records. The definition of similarity between two data records is one of the most important factors in the clustering process, as it provides the basis for the identification of high-quality clusters. After discussing the basic properties that a proximity measure must satisfy, the authors present a collection of measures that are suitable for continuous, discrete, and mixed variables. Chapters 3 to 9 are dedicated to specific clustering algorithms, technologies, and theories that have been proposed to facilitate clustering in different data domains and application environments. The last section of these chapters is devoted to the presentation of real-world applications, where the corresponding approaches are commonly adopted. In particular, chapter 3 collects clustering algorithms that organize the data records into a hierarchical structure; each level of this structure corresponds to a clustering solution of a different number of clusters. The clustering hierarchy can be built either in a bottom-up (agglomerative algorithm) or in a top-down (divisive algorithm) fashion. After presenting the classical hierarchical clustering schemes, the authors concentrate on a set of recent hierarchical approaches that are more robust to noise and outliers. Chapter 4 presents a set of partitional clustering solutions, where the data records are directly partitioned into a prespecified number of clusters. Xu and Wunsch present in detail the popular k -means algorithm and its advancements, as well as some graph theory, fuzzy, and search technique clustering methodologies. The use of neural networks in clustering is highlighted in chapter 5, where the authors discuss existing clustering approaches that are suitable for either hard or soft competitive learning. Chapter 6 is dedicated to kernel-based clustering solutions that map a set of nonlinearly separable patterns into a higher dimensional feature space, where they are linearly separable. After presenting the theory behind kernel-based clustering approaches, Xu and Wunsch discuss nonlinear principal component analysis, squared error-based clustering, and support vector kernel-based clustering. Chapters 7 to 9 are devoted to more recent applications involving the clustering of sequential, large-scale, or high-dimensional data. Specifically, chapter 7 focuses on the clustering of sequential data, commonly met in medical sciences. In this chapter, the authors present formulas to quantify sequence similarity, as well as three clustering algorithms that are suitable for sequential data. Following this, chapter 8 deals with the clustering of large-scale data, where the scalability of the clustering algorithm is a top priority. The existing methodologies are divided into six categories: random sampling, data condensation, density-based, grid-based, divide and conquer, and incremental learning. Then, in chapter 9, the authors present a set of methods for the clustering of high-dimensional data. As part of this chapter, both linear and nonlinear projection algorithms are investigated, along with projected and subspace clustering approaches. The role of data visualization is also emphasized. Chapter 10 presents metrics for the validation of the clustering results. The authors divide the existing metrics into three categories: external indices, internal indices, and relative indices. Finally, the last chapter of the book summarizes research challenges and presents trends in the area. The book targets researchers and graduate students in the clustering field. However, the book is easy to follow even by nonexperts, as it does not require significant background knowledge. On the positive side, the book covers a wide spectrum of real-world applications and provides rich references for further reading. On the negative side, although the book presents the workings of the algorithms with a reasonable degree of detail, it provides no specific examples of their operation. Furthermore, in some clustering algorithms, the authors do not discuss their bias to aspects such as the shape of the identified clusters and their robustness to outliers. Online Computing Reviews Service

Raphael M. Malyankar

The classification of tangible and intangible entities according to measurements or estimations of their characteristics is an old problem in science. Clustering techniques are used in many fields, including social sciences, natural sciences ranging from astrophysics to biology, information sciences ranging from machine learning to information retrieval, and commercial market research. Accordingly, there are many varieties of approaches. The corpus of clustering literature is very large and sometimes confusing, by virtue of the number of publications and diversity of descriptions in the science, mathematics, and computer science literature. The importance of this problem and the need to improve performance means that research into clustering algorithms continues to be valuable. This book attempts to describe the basic concepts and algorithms, as well as the state of the art in this field. The book begins with two introductory chapters: the first contains an introduction to the fundamental concepts of clustering and examples of its use; the second presents considerations and types of distance metrics for different kinds of data. Chapters 3 and 4 describe well-known algorithms for partitional clustering and hierarchical clustering, supplemented by descriptions of more recent developments in these types of algorithms. Approaches based on neural networks (chapter 5) and kernel methods (chapter 6) are covered next. Sequential data, ranging from time series to DNA sequences, presents its own special problems and requirements for clustering algorithms; chapter 7 describes the issues and solutions for this category of data, including a discussion of sequence alignment. Large data volumes present their own problems for classic and recent algorithms, caused by performance requirements; chapter 8 describes several adaptations and techniques for working with large collections of data. Algorithms for high-dimensional data are covered in chapter 9. Chapters 10 and 11 round out the book with a description of criteria for cluster validation-deciding whether the application of clustering methods detects an actually existing structure-and a discussion of general methods, requirements, and tradeoffs in the application of cluster analysis algorithms. Discussions of significant and interesting applications for each family of algorithms are included. Exercises in the theory are included, as are citations for the techniques described. The exposition is suitable for a graduate-level course or self-study by a professional. It is relatively easy to understand, given its subject matter and mathematical sophistication. Given the available space, the content and discussion are sometimes necessarily terse; the reader who is looking for details will need to refer to the primary literature cited. A degree of mathematical sophistication and a relevant background in statistics or related areas of engineering or computer science is necessary, preferably including an understanding of the preliminary concepts used in the text, such as neural networks and eigenvectors. The book covers a lot of ground in a relatively small number of pages, and should work well as a learning tool and reference. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations