More Web Proxy on the site http://driver.im/

research-article

VIPurPCA: Visualizing and Propagating Uncertainty in Principal Component Analysis

Authors:

Philipp Hennig,

Kay NieseltAuthors Info & Claims

IEEE Transactions on Visualization and Computer Graphics, Volume 30, Issue 4

Pages 2011 - 2022

https://doi.org/10.1109/TVCG.2023.3345532

Published: 21 December 2023 Publication History

Abstract

Variables obtained by experimental measurements or statistical inference typically carry uncertainties. When an algorithm uses such quantities as input variables, this uncertainty should propagate to the algorithm's output. Concretely, we consider the classic notion of principal component analysis (PCA): If it is applied to a finite data matrix containing imperfect (i.e., uncertain) multidimensional measurements, its output—a lower-dimensional representation—is itself subject to uncertainty. We demonstrate that this uncertainty can be approximated by appropriate linearization of the algorithm's nonlinear functionality, using automatic differentiation. By itself, however, this structured, uncertain output is difficult to interpret for users. We provide an animation method that effectively visualizes the uncertainty of the lower dimensional map. Implemented as an open-source software package, it allows researchers to assess the reliability of PCA embeddings.

References

[1]

R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, “Deep learning for healthcare: Review, opportunities and challenges,” Brief. Bioinf., vol. 19, no. 6, pp. 1236–1246, Nov. 2018. [Online]. Available: https://europepmc.org/articles/PMC6455466

[2]

Y. Hasin, M. Seldin, and A. Lusis, “Multi-omics approaches to disease,” Genome Biol., vol. 18, no. 1, pp. 1–15, 2017.

[3]

G. Litjens et al., “A survey on deep learning in medical image analysis,” Med. Image Anal., vol. 42, pp. 60–88, 2017.

[4]

K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” London Edinburgh Dublin Philos. Mag. J. Sci., vol. 2, no. 11, pp. 559–572, 1901.

[5]

C. Posth et al., “The origin and legacy of the etruscans through a 2000-year archeogenomic time transect,” Sci. Adv., vol. 7, no. 39, 2021, Art. no.

[6]

J. A. Lee and M. Verleysen, Nonlinear Dimensionality Reduction. Berlin, Germany: Springer Science & Business Media, 2007.

[7]

L. K. Saul, K. Q. Weinberger, F. Sha, J. Ham, and D. D. Lee, “Spectral methods for dimensionality reduction,” Semi-Supervised Learn., vol. 3, pp. 292–308, 2006.

[8]

L. Van Der Maaten, E. Postma, and J. Van den Herik, “Dimensionality reduction: A comparative review,” J Mach. Learn. Res., vol. 10, no. 66-71, pp. 1–41, 2009.

[9]

L. G. Nonato and M. Aupetit, “Multidimensional projection for visual analytics: Linking techniques with distortions, tasks, and layout enrichment,” IEEE Trans. Vis. Comput. Graph., vol. 25, no. 8, pp. 2650–2673, Aug. 2019.

[10]

C. Spearman, “General intelligence objectively determined and measured,” Amer. J. Psychol., vol. 15, pp. 107–197, 1904.

[11]

W. S. Torgerson, “Multidimensional scaling: I theory and method,” Psychometrika, vol. 17, no. 4, pp. 401–419, 1952.

[12]

M. E. Tipping and C. M. Bishop, “Probabilistic principal component analysis,” J. Roy. Stat. Soc.: Ser. B. (Statist. Methodol.), vol. 61, no. 3, pp. 611–622, 1999.

[13]

J. Görtler, T. Spinner, D. Streeb, D. Weiskopf, and O. Deussen, “Uncertainty-aware principal component analysis,” IEEE Trans. Vis. Comput. Graph., vol. 26, no. 1, pp. 822–831, Jan. 2020.

[14]

Y.-H. Chan, C. D. Correa, and K.-L. Ma, “Flow-based scatterplots for sensitivity analysis,” in Proc. IEEE Symp. Vis. Analytics Sci. Technol., 2010, pp. 43–50.

[15]

Y.-H. Chan, C. D. Correa, and K.-L. Ma, “The generalized sensitivity scatterplot,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 10, pp. 1768–1781, Oct. 2013.

[16]

R. Faust, D. Glickenstein, and C. Scheidegger, “DimReader: Axis lines that explain non-linear projections,” IEEE Trans. Vis. Comput. Graph., vol. 25, no. 1, pp. 481–490, Jan. 2019.

Digital Library

[17]

K. Pöthkow, B. Weber, and H.-C. Hege, “Probabilistic marching cubes,” Comput. Graph. Forum, vol. 30, no. 3, pp. 931–940, 2011.

Digital Library

[18]

T. Athawale and A. Entezari, “Uncertainty quantification in linear interpolation for isosurface extraction,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 12, pp. 2723–2732, Dec. 2013.

Digital Library

[19]

T. M. Athawale, B. Ma, E. Sakhaee, C. R. Johnson, and A. Entezari, “Direct volume rendering with nonparametric models of uncertainty,” IEEE Trans. Vis. Comput. Graph., vol. 27, no. 2, pp. 1797–1807, Feb. 2021.

[20]

D. Spiegelhalter, M. Pearson, and I. Short, “Visualizing uncertainty about the future,” Science, vol. 333, no. 6048, pp. 1393–1400, 2011.

[21]

P. Levontin and J. L. Walton, Visualising Uncertainty a Short Introduction. U.K.: Sad Press and Friends, 2020.

[22]

S. Deitrick, “Evaluating implicit visualization of uncertainty for public policy decision support,” in Proc. AutoCarto, 2012.

[23]

C. Kinkeldey, A. M. MacEachren, and J. Schiewe, “How to assess visual communication of uncertainty? A systematic review of geospatial uncertainty visualisation user studies,” Cartographic J., vol. 51, no. 4, pp. 372–386, 2014.

[24]

M. Matthews, L. Rehak, J. Famewo, T. Taylor, J. Robson, and Humansystems Inc Guelph (ONTARIO), “Evaluation of new visualization approaches for representing uncertainty in the recognized maritime picture,” DRDC Atlantic CR, vol. 177,2008.

[25]

M. Skeels, B. Lee, G. Smith, and G. G. Robertson, “Revealing uncertainty for information visualization,” Inf. Visual., vol. 9, no. 1, pp. 70–81, 2010.

Digital Library

[26]

D. Weiskopf, “Uncertainty visualization: Concepts, methods, and applications in biological data visualization,” Front. Bioinf., vol. 2, 2022, Art. no.

[27]

C. Schulz, A. Nocaj, J. Goertler, O. Deussen, U. Brandes, and D. Weiskopf, “Probabilistic graph layout for uncertain network visualization,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 1, pp. 531–540, Jan. 2017.

Digital Library

[28]

J. Hullman, P. Resnick, and E. Adar, “Hypothetical outcome plots outperform error bars and violin plots for inferences about reliability of variable ordering,” PLoS One, vol. 10, no. 11, 2015, Art. no. [Online]. Available: http://idl.cs.washington.edu/papers/hops

[29]

D. Zhang, E. Adar, and J. Hullman, “Visualizing uncertainty in probabilistic graphs with network hypothetical outcome plots (NetHOPs),” IEEE Trans. Vis. Comput. Graph., vol. 28, no. 1, pp. 443–453, Jan. 2022.

Digital Library

[30]

P. Hennig, “Animating samples from Gaussian distributions,” Max Planck Institute for Intelligent Systems, Spemannstraße, 72076 Tübingen, Germany, Tec. Rep. 8, Sep. 2013.

[31]

R. L. Iman and J. C. Helton, “An investigation of uncertainty and sensitivity analysis techniques for computer models,” Risk Anal., vol. 8, no. 1, pp. 71–90, 1988.

[32]

S. H. Lee and W. Chen, “A comparative study of uncertainty propagation methods for black-box-type problems,” Struct. Multidisciplinary Optim., vol. 37, no. 3, pp. 239–253, 2009.

[33]

H. Janssen, “Monte-Carlo based uncertainty analysis: Sampling efficiency and sampling convergence,” Rel. Eng. Syst. Saf., vol. 109, pp. 123–132, 2013.

[34]

D. A. Pérez, H. Gietler, and H. Zangl, “Automatic uncertainty propagation based on the unscented transform,” in Proc. IEEE Int. Instrum. Meas. Technol. Conf., 2020, pp. 1–6.

[35]

R. Schenkendorf, “A general framework for uncertainty propagation based on point estimate methods,” in Proc. PHM Soc. Eur. Conf., 2014, vol. 2, no. 1.

[36]

B. Ochoa and S. Belongie, “Covariance propagation for guided matching,” in Proc. Workshop Stat. Methods Multi-Image Video Process., 2006.

[37]

H. Hotelling, “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol., vol. 24, no. 6, pp. 417–441, 1933.

[38]

S. Barratt, “A matrix Gaussian distribution,” 2018,.

[39]

A. K. Gupta and D. K. Nagar, Matrix Variate Distributions. Boca Raton, FL, USA: Chapman and Hall/CRC, 2018.

[40]

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Automatic differentiation in machine learning: A survey,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 5595–5637, 2017.

Digital Library

[41]

M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. [Online]. Available: https://www.tensorflow.org/

[42]

A. Paszke et al., “Automatic differentiation in PyTorch,” in Proc. Int. Conf. Neural Inf. Process. Syst. Workshops, 2017.

[43]

D. Maclaurin, “Modeling, inference and optimization with composable differentiable procedures,” Ph.D. dissertation, Harvard University, 2016.

[44]

M. Seeger, A. Hetzel, Z. Dai, E. Meissner, and N. D. Lawrence, “Auto-differentiating linear algebra,” 2017,.

[45]

J. Bradbury et al., “JAX: Composable transformations of python NumPy programs,” 2018. [Online]. Available: http://github.com/google/jax

[46]

P. Hennig, M. A. Osborne, and H. P. Kersting, Probabilistic Numerics: Computation as Machine Learning. Cambridge, U.K.: Cambridge Univ. Press, 2022.

[47]

T. Denoeux and M.-H. Masson, “Principal component analysis of fuzzy data using autoassociative neural networks,” IEEE Trans. Fuzzy Syst., vol. 12, no. 3, pp. 336–349, Jun. 2004.

Digital Library

[48]

C. Higuera, K. J. Gardiner, and K. J. Cios, “Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome,” PLoS One, vol. 10, no. 6, 2015, Art. no.

[49]

R. Pearson, Pumadata: Various data sets for use with the puma package, 2020. [Online]. Available: http://umber.sbs.man.ac.uk/resources/puma

[50]

J. R. van Dorp and S. Kotz, “Generalized trapezoidal distributions,” Metrika, vol. 58, no. 1, pp. 85–97, 2003.

[51]

D. Dua and C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml

[52]

R. D. Pearson, X. Liu, G. Sanguinetti, M. Milo, N. D. Lawrence, and M. Rattray, “puma: A bioconductor package for propagating uncertainty in microarray analysis,” BMC Bioinf., vol. 10, no. 1, pp. 1–10, 2009.

[53]

X. Liu, M. Milo, N. D. Lawrence, and M. Rattray, “A tractable probabilistic model for affymetrix probe-level analysis across multiple chips,” Bioinformatics, vol. 21, no. 18, pp. 3637–3644, 2005.

Digital Library

[54]

X. Li et al., “Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale,” Nature Genet., vol. 52, no. 9, pp. 969–983, 2020.

[55]

L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, 2008.

[56]

L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform manifold approximation and projection for dimension reduction,” 2018,.

Index Terms

VIPurPCA: Visualizing and Propagating Uncertainty in Principal Component Analysis

Index terms have been assigned to the content through auto-classification.

Recommendations

Nonlinear principal component analysis to preserve the order of principal components

Principal component analysis (PCA) is an effective method of linear dimensional reduction. Because of its simplicity in theory and implementation, it is often used for analyses in various disciplines. However, because of its linearity, PCA is not always ...
Principal Component Analysis: A Natural Approach to Data Exploration

Principal component analysis (PCA) is often applied for analyzing data in the most diverse areas. This work reports, in an accessible and integrated manner, several theoretical and practical aspects of PCA. The basic principles underlying PCA, data ...
Parameterized principal component analysis

A method for manifold approximation where the low dimensional space is a PCA model with the mean and principal vectors modeled as smooth functions of a parameter that depends on the position on the manifold.Generalizations where the manifold dimension ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Visualization and Computer Graphics

IEEE Transactions on Visualization and Computer Graphics Volume 30, Issue 4

April 2024

170 pages

Issue’s Table of Contents

© 2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 21 December 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents