Categorical Data Clustering

Periklis Andritsos³ &
Panayiotis Tsaparas⁴

1827 Accesses

Abstract

In this chapter, we provide an overview of the categorical data clustering problem. We first present different techniques for the general cluster analysis problem, and then study how these techniques specialize to the case of non-numerical (categorical) data. We also present measures and techniques developed specifically for this domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Categorical Data Clustering

Partitional Clustering

Data clustering: application and trends

Article 27 November 2022

Recommended Reading

Andritsos P, Tsaparas P, Miller RJ, Sevcik KC (2004) LIMBO: scalable clustering of categorical data. In: Proceedings of the 9th international conference on extending database technology (EDBT), Heraklion, 14–18 Mar 2004, pp 123–146
Google Scholar
Barbarà D, Couto J, Li Y (2002) COOLCAT: an entropy-based algorithm for categorical clustering. In: Proceedings of the 11th international conference on information and knowledge management (CIKM), McLean, 4–9 Nov 2002, pp 582–589
Google Scholar
Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York
Book MATH Google Scholar
Das G, Mannila H (2000) Context-based similarity measures for categorical databases. In: Proceedings of the 4th European conference on principles of data mining and knowledge discovery (PKDD), Lyon, 13–16 Sept 2000, pp 201–210
Google Scholar
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2: 139–172
Google Scholar
Ganti V, Gehrke J, Ramakrishnan R (1999) CACTUS: clustering categorical data using summaries. In: Proceedings of the 5th international conference on knowledge discovery and data mining, (KDD), San Diego, 15–18 Aug 1999, pp 73–83
Google Scholar
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. In: ACM transactions on knowledge discovery from data (TKDD), Mar 2007, vol 1, No 1. Association for Computing Machinery, New York
Google Scholar
Gibson D, Kleinberg JM, Raghavan P (1998) Clustering categorical data: an approach based on dynamical systems. In: Proceedings of the 24rth international conference on very large data bases, (VLDB), New York, 24–27 Aug 1998, pp 311–322
Google Scholar
Gluck M, Corter J (1985) Information, uncertainty, and the utility of categories. In: Proceedings of the 7th annual conference of the Cognitive Science Society (COGSCI), Irvine, pp 283–287
Google Scholar
Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical atributes. In: Proceedings of the 15th international conference on data engineering, Sydney, 23–26 Mar 1999, pp 512–521
Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Jarke M, Lenzerini M, Vassiliou Y, Vassiliadis P (1999) Fundamentals of data warehouses. Springer-Verlag, Berlin/Heidelberg
MATH Google Scholar
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
MATH Google Scholar
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Article Google Scholar
Zaki MJ, Peters M, Assent I, Seidl T (2005) CLICKS: an effective algorithm for mining subspace clusters in categorical datasets. In: Proceeding of the 11th international conference on knowledge discovery and data mining (KDD), Chicago, 21–24 Aug 2005, pp 736–742
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information, University of Toronto, Toronto, Canada
Periklis Andritsos
Department of Computer Science & Engineering, University of Ioannina, Ioannina, Greece
Panayiotis Tsaparas

Authors

Periklis Andritsos
View author publications
You can also search for this author in PubMed Google Scholar
Panayiotis Tsaparas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Periklis Andritsos .

Editor information

Editors and Affiliations

Engineering (CSE), University of New South Wales School of Computer Science &, Sydney, New South Wales, Australia
Claude Sammut
Software Engineering, Monash University School of Computer Science &, Melbourne, Victoria, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Andritsos, P., Tsaparas, P. (2016). Categorical Data Clustering. In: Sammut, C., Webb, G. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7502-7_35-1

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7502-7_35-1
Received: 20 November 2014
Accepted: 21 June 2016
Published: 12 August 2016
Publisher Name: Springer, Boston, MA
Online ISBN: 978-1-4899-7502-7
eBook Packages: Living Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Categorical Data Clustering

Abstract

Access this chapter

Similar content being viewed by others

Categorical Data Clustering

Partitional Clustering

Data clustering: application and trends

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us