Abstract
Scatter plot is a useful method for visualising clusters and outliers in continuous data. However, this method cannot be used directly on nominal data due to a lack of natural ordering and ‘distance’ in nominal values. One solution to this problem is to map the multi-dimensional nominal data to a numeric space, and then draw a scatter plot of the data points based on the first two principal components of the numeric space. This paper reports a study on how such plots can be generated using three types of mapping: (a) Binary Input Mapping (BImap), (b) Attribute Value Frequency Mapping (AVFmap), and (c) BImap combined with AVFmap. Results show that the combined method draws upon the complementary strengths of BImap and AVFmap, to generate meaningful scatter plots for visualising categorical outliers and achieve the highest information gain among the methods tested.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dejan T (2008) Gestalt principles. Scholarpedia 3(12):5345
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, CA Online: http://archive.ics.uci.edu/ml
Smith LI (2002) A tutorial on principal component analysis. Online: www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Kader GD, Perry M (2007) Variability for categorical variables. J Stat Educ 15(2). Online: www.amstat.org/publications/jse/v15n2/kader.html
Koufakou A, Ortiz E, Georgiopoulos M, Anagnostopoulos G, Reynolds K (2007) A scalable and efficient outlier detection strategy for categorical data. In: Proceedings of IEEE international conference on tools with artificial intelligence ICTAI, pp. 210–217
Ma S, Hellerstein JL (1999) Ordering categorical data to improve visualization. In: IEEE information visualization symposium, IEEE, pp. 15–18
Chandola V, Boriah S, Kumar V (2009) A framework for exploring categorical data. In: Proceedings of the ninth SIAM international conference on data mining
Friendly M (2000) Visualizing categorical data. SAS Publishing, Cary
LeBlanc J, Ward MO, Wittels N (1990) Exploring N-dimensional databases. In: Proceedings of visualization ’90, pp. 230–237
Bendix F, Kosara R, Hauser H (2005) Parallel sets: a visual analysis of categorical data. In: Proceedings of the IEEE symposium on information visualization, pp. 133–140
Greenacre MJ (1984) Theory and application of correspondence analysis. Academic Press, London
Shiraishi K, Misue K, Tanaka J (2009) A tool for analyzing categorical data visually with granular representation. In: Proceedings of the symposium on human interface 2009 on human interface and the management of information. Information and interaction. Part II. Springer-Verlag Berlin, Heidelberg, pp. 342–351
Rabenhorst DA (2000) Revitalizing the scatter plot. In: Proceedings of SPIE vol 3905, 28th AIPR workshop: 3D visualization for data exploration and decision making, pp. 25–34
Rosario G, Rundensteiner E, Brown D, Ward M, Huang S (2004) Mapping nominal values to numbers for effective visualization. Inf Vis 3(2):80–95
Ting KM, Zhou GT, Liu FT, Tan SC (2013) Mass estimation. Mach Learn 90(1):127–160
Claude ES (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423
Linting M, Meulman JJ, Groenen PJF, Van der Koojj AJ (2007) Nonlinear principal components analysis: introduction and application. Psychol Methods 12(3):36–358
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Tan, S.C. (2014). Visualising Outliers in Nominal Data. In: Uden, L., Wang, L., Corchado Rodríguez, J., Yang, HC., Ting, IH. (eds) The 8th International Conference on Knowledge Management in Organizations. Springer Proceedings in Complexity. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7287-8_28
Download citation
DOI: https://doi.org/10.1007/978-94-007-7287-8_28
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-7286-1
Online ISBN: 978-94-007-7287-8
eBook Packages: Computer ScienceComputer Science (R0)