[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Using multiple attribute-based explanations of multidimensional projections to explore high-dimensional data

Published: 01 August 2021 Publication History

Highlights

We provide additional examples of how the five explanatory views in [Van Driel et al] and [Da Silva et al] can be combined in a visual analytics fashion to find relevant insights in high-dimensional datasets that cannot be found using a single view.
We illustrate the above process on five non-synthetic datasets, and correlate the obtained insights with ground-truth information independently extracted by other researchers from three of these datasets. Of these datsets, a single one has been used in the earlier work, the other being new. Also, the correlation of obtained insights with the ground-truth information is new.
We present a new explanatory method, variance ratio, for computing local dimensionality.
We discuss in detail the parameter settings and dependency on the used projection techniques of our proposed explanatory visualization.

Graphical abstract

The Graphical Abstract includes the first row of Figure 3 in the paper. From left to right, these images are the explanation of the wine quality dataset using the dimension contribution, variance, and dimension correlation.
Display Omitted

Abstract

Multidimensional projections (MPs) are effective methods for visualizing high-dimensional datasets to find structures in the data like groups of similar points and outliers. The insights obtained from MPs can be amplified by complementing these techniques by several so-called explanatory mechanisms. We present and discuss a set of six such mechanisms that explain MPs in terms of similar dimensions, local dimensionality, and dimension correlations. We implement our explanatory tools using an image-based approach, which is efficient to compute, scales well visually for large and dense MP scatterplots, and can handle any projection technique. We demonstrate how the provided explanatory views can be combined to augment each other’s value and thereby lead to refined insights in the data for several high-dimensional datasets, and how these insights correlate with known facts about the data under study.

References

[1]
M. Greenacre, Biplots in practice, Fundacion BBVA, Bilbao, 2010.
[2]
J. Gower, S. Lubbe, N. Roux, Understanding biplots, Wiley, 2011.
[3]
B. Broeksema, T. Baudel, A. Telea, Visual analysis of multidimensional categorical datasets, Computer Graphics Forum 32 (8) (2013) 158–169.
[4]
D. Coimbra, R. Martins, T. Neves, A. Telea, F. Paulovich, Explaining three-dimensional dimensionality reduction plots, Information Visualization 15 (2) (2016) 154–172.
[5]
P. Pagliosa, F. Paulovich, R. Minghim, H. Levkowitz, L. Nonato, Projection inspector: Assessment and synthesis of multidimensional projections, Neurocomputing 150 (2015) 599–610.
[6]
P. Joia, D. Coimbra, J.A. Cuminato, F.V. Paulovich, L.G. Nonato, Local affine multidimensional projection, IEEE TVCG 17 (12) (2011) 2563–2571.
[7]
P. Rauber, R. da Silva, S. Feringa, M. Celebi, A. Falcao, A. Telea, Interactive image feature selection aided by dimensionality reduction, Proc. EuroVA, 2015, pp. 97–101.
[8]
M. Aupetit, Visualizing distortions and recovering topology in continuous projection techniques, Neurocomputing 10 (7-9) (2007) 1304–1330.
[9]
T. Schreck, T. von Landesberger, S. Bremm, Techniques for precision-based visual analysis of projected data, Information Visualization 9 (3) (2010) 181–193.
[10]
R. Martins, D. Coimbra, R. Minghim, A.C. Telea, Visual analysis of dimensionality reduction quality for parameterized projections, Computers & Graphics 41 (2014) 26–42.
[11]
R. da Silva, P. Rauber, R. Martins, R. Minghim, A. Telea, Attribute-based visual explanation of multidimensional projections, Proc. EuroVA, 2015, pp. 97–101.
[12]
D. van Driel, X. Zhai, Z. Tian, A. Telea, Enhanced attribute-based explanations of multidimensional projections, Proc. EuroVA, Eurographics, 2020.
[13]
J.B. Tenenbaum, V. De Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (5500) (2000) 2319–2323.
[14]
V. De Silva, J.B. Tenenbaum, Sparse multidimensional scaling using landmark points, Tech. Rep., Stanford University, 2004.
[15]
L. van der Maaten, G.E. Hinton, Visualizing data using t-SNE, JMLR 9 (2008) 2579–2605.
[16]
McInnes L., Healy J., Melville J. UMAP: Uniform manifold approximation and projection for dimension reduction. 2018. ArXiv:1802.03426v2 [stat.ML].
[17]
L.G. Nonato, M. Aupetit, Multidimensional projection for visual analytics: Linking techniques with distortions, tasks, and layout enrichment, IEEE TVCG 25 (8) (2018) 2650–2673.
[18]
M. Espadoto, R. Martins, A. Kerren, N. Hirata, A. Telea, Towards a quantitative survey of dimension reduction techniques, IEEE TVCG (2019).
[19]
X. Geng, D. Zhan, Z. Zhou, Supervised nonlinear dimensionality reduction for visualization and classification, IEEE Trans Syst Man Cybern 35 (6) (2005) 1098–1107.
[20]
J. Venna, S. Kaski, Visualizing gene interaction graphs with local multidimensional scaling, Proc. ESANN, 2006, pp. 557–562.
[21]
F.V. Paulovich, L.G. Nonato, R. Minghim, H. Levkowitz, Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping, IEEE TVCG 14 (3) (2008) 564–575.
[22]
M. Sips, B. Neubert, J. Lewis, P. Hanrahan, Selecting good views of high-dimensional data using class consistency, Comp Graph Forum 28 (3) (2009) 831–838.
[23]
J.A. Lee, M. Verleysen, Quality assessment of dimensionality reduction: Rank-based criteria, Neurocomputing 72 (7) (2009) 1431–1443.
[24]
W. Lueks, A. Gisbrecht, B. Hammer, Visualizing the quality of dimensionality reduction, Neurocomputing 112 (2013) 109–123.
[25]
S. Lespinats, M. Aupetit, CheckViz: Sanity check and topological clues for linear and non-linear mappings, Comp Graph Forum 30 (1) (2011) 113–125.
[26]
A. Tatu, P. Bak, E. Bertini, D. Keim, J. Schneidewind, Visual quality metrics and human perception: An initial study on 2D projections of large multidimensional data, Proc. AVI, ACM, 2010, pp. 49–56.
[27]
S. Oeltze, H. Doleisch, H. Hauser, Interactive visual analysis of perfusion data, IEEE TVCG 13 (6) (2007) 1392–1399.
[28]
K. Olsen, R. Korfhage, K. Sochats, Visualization of a document collection: the VIBE system, Inform Process Manag 29 (1) (1993) 69–81.
[29]
A. Endert, P. Flaux, C. North, Semantic interaction for visual text analytics, Proc. ACM CHI, 2012, pp. 324–333.
[30]
J. Yi, R. Melton, J. Stasko, Dust & magnet: multivariate information visualization using a magnet metaphor, Inform Visual 4 (4) (2005) 239–256.
[31]
H. Piringer, R. Kosara, H. Hauser, Interactive F + C visualization with linked 2D/3D scatterplots, Proc. IEEE CMV, 2004, pp. 49–60.
[32]
N. Elmqvist, P. Dragicevic, J.-D. Fekete, Rolling the dice: multidimensional visual exploration using scatterplot matrix navigation, IEEE TVCG 14 (8) (2008) 1141–1148.
[33]
F.C.M. Rodrigues, M. Espadoto, R. Hirata, A. Telea, Constructing and visualizing high-quality classifier decision boundary maps, Information 10 (9) (2019) 280–297.
[34]
N. Cliff, The eigenvalues-greater-than-one rule and the reliability of components, Psychological Bulletin 103 (2) (1988) 276–279.
[35]
I.T. Jolliffe, Principal Component Analysis, Springer, 2002.
[36]
L.J. O’Donnell, C.F. Westin, An introduction to diffusion tensor image analysis, Neurosurg Clin N Am 22 (2) (2011) 185–196.
[37]
P.B. P, A. Falguerolles, Application of resampling methods to the choice of dimension in principal component analysis, Computer Intensive Methods in Statistics, Springer, 1993, pp. 167–176.
[38]
G.R. North, T.L. Bell, R.F. Cahalan, F.J. Moeng, Sampling errors in the estimation of empirical orthogonal functions, Mon Weather Rev 110 (1982) 699–706.
[39]
I.-C. Yeh, Modeling of strength of high performance concrete using artificial neural networks, Cement and Concrete Research 28 (12) (1998) 1797–1808.
[40]
Lichman M. UCI machine learning repository. 2013. http://archive.ics.uci.edu/ml.
[41]
R. da Silva, Visualizing multidimensional data similarities – improvements and applications, University of Groningen, Netherlands, 2016.
[42]
S. Wu, B. Li, J. Yang, S. Shukla, Predictive modeling of high-performance concrete with regression analysis, Proc. IEEE Intl. Conf. on Industrial Engineering and Engineering Management, 2010.
[43]
P. Cortez, A. Cerdeira, F. Almeida, T. Matos, J. Reis, Modeling wine preferences by data mining from physicochemical properties, Decision Support Systems 47 (4) (2009) 547–553.
[44]
J.J. van Wijk, A. Telea, Enridged contour maps, Proc. IEEE Visualization, 2001, pp. 69–74.
[45]
E.J. Beh, C.I. Holdsworth, A visual evaluation of a classification method for investigating the psysicochemical properties of Portugese wine, Current Anal Chem 8 (2) (2012) 205–217.
[46]
Zeng L. The wine dataset analysis. 2021. https://rpubs.com/Li2019/Wine.
[47]
P. Meirelles, C. Santos, J. Miranda, F. Kon, A. Terceiro, C. Chavez, A study of the relationships between source code metrics and attractiveness in free software projects, Proc. Brazilian Symposium on Software Engineering (SBES), 2010, pp. 11–20.
[48]
C. Richter, Designing Flexible Object-Oriented Systems with UML, New Riders Publishing, 1999.
[49]
S. Zhang, B. Guo, A. Dong, J. He, Z. Xu, S. Chen, Cautionary tales on air-quality improvement in Beijing, Proc Royal Society A 473 (2205) (2017) 20170457.
[50]
S.D. Vito, E. Massera, M. Piga, L. Martinotto, G.D. Francia, On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sensors and Actuators B: Chemical 129 (2) (2008) 750–757.
[51]
Eigen numerical library. 2020. http://eigen.tuxfamily.org.
[52]
R. Etemadpour, R. Motta, J. de Souza Paiva, R. Minghim, M.D. Oliveira, L. Linsen, Perception-based evaluation of projection methods for multidimensional data visualization, IEEE TVCG 21 (1) (2014) 81–94.
[53]
L. Wilkinson, A. Arland, R. Grossman, Graph-theoretic scagnostics, Proc. InfoVis, 2005, pp. 157–164.

Cited By

View all

Index Terms

  1. Using multiple attribute-based explanations of multidimensional projections to explore high-dimensional data
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Computers and Graphics
        Computers and Graphics  Volume 98, Issue C
        Aug 2021
        347 pages

        Publisher

        Pergamon Press, Inc.

        United States

        Publication History

        Published: 01 August 2021

        Author Tags

        1. Dimensionality reduction
        2. Explanatory techniques
        3. High-dimensional data analysis

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 14 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Seeing is Learning in High Dimensions: The Synergy Between Dimensionality Reduction and Machine LearningSN Computer Science10.1007/s42979-024-02604-y5:3Online publication date: 21-Feb-2024
        • (2023)Visualizing High-Dimensional Functions with Dense MapsSN Computer Science10.1007/s42979-022-01664-24:3Online publication date: 22-Feb-2023
        • (2023)Stability Analysis of Supervised Decision Boundary MapsSN Computer Science10.1007/s42979-022-01662-44:3Online publication date: 21-Feb-2023
        • (2021)Contrastive analysis for scatterplot-based representations of dimensionality reductionComputers and Graphics10.1016/j.cag.2021.08.014101:C(46-58)Online publication date: 1-Dec-2021

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media