[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

VIPurPCA: Visualizing and Propagating Uncertainty in Principal Component Analysis

Published: 21 December 2023 Publication History

Abstract

Variables obtained by experimental measurements or statistical inference typically carry uncertainties. When an algorithm uses such quantities as input variables, this uncertainty should propagate to the algorithm's output. Concretely, we consider the classic notion of principal component analysis (PCA): If it is applied to a finite data matrix containing imperfect (i.e., uncertain) multidimensional measurements, its output—a lower-dimensional representation—is itself subject to uncertainty. We demonstrate that this uncertainty can be approximated by appropriate linearization of the algorithm's nonlinear functionality, using automatic differentiation. By itself, however, this structured, uncertain output is difficult to interpret for users. We provide an animation method that effectively visualizes the uncertainty of the lower dimensional map. Implemented as an open-source software package, it allows researchers to assess the reliability of PCA embeddings.

References

[1]
R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, “Deep learning for healthcare: Review, opportunities and challenges,” Brief. Bioinf., vol. 19, no. 6, pp. 1236–1246, Nov. 2018. [Online]. Available: https://europepmc.org/articles/PMC6455466
[2]
Y. Hasin, M. Seldin, and A. Lusis, “Multi-omics approaches to disease,” Genome Biol., vol. 18, no. 1, pp. 1–15, 2017.
[3]
G. Litjens et al., “A survey on deep learning in medical image analysis,” Med. Image Anal., vol. 42, pp. 60–88, 2017.
[4]
K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” London Edinburgh Dublin Philos. Mag. J. Sci., vol. 2, no. 11, pp. 559–572, 1901.
[5]
C. Posth et al., “The origin and legacy of the etruscans through a 2000-year archeogenomic time transect,” Sci. Adv., vol. 7, no. 39, 2021, Art. no.
[6]
J. A. Lee and M. Verleysen, Nonlinear Dimensionality Reduction. Berlin, Germany: Springer Science & Business Media, 2007.
[7]
L. K. Saul, K. Q. Weinberger, F. Sha, J. Ham, and D. D. Lee, “Spectral methods for dimensionality reduction,” Semi-Supervised Learn., vol. 3, pp. 292–308, 2006.
[8]
L. Van Der Maaten, E. Postma, and J. Van den Herik, “Dimensionality reduction: A comparative review,” J Mach. Learn. Res., vol. 10, no. 66-71, pp. 1–41, 2009.
[9]
L. G. Nonato and M. Aupetit, “Multidimensional projection for visual analytics: Linking techniques with distortions, tasks, and layout enrichment,” IEEE Trans. Vis. Comput. Graph., vol. 25, no. 8, pp. 2650–2673, Aug. 2019.
[10]
C. Spearman, “General intelligence objectively determined and measured,” Amer. J. Psychol., vol. 15, pp. 107–197, 1904.
[11]
W. S. Torgerson, “Multidimensional scaling: I theory and method,” Psychometrika, vol. 17, no. 4, pp. 401–419, 1952.
[12]
M. E. Tipping and C. M. Bishop, “Probabilistic principal component analysis,” J. Roy. Stat. Soc.: Ser. B. (Statist. Methodol.), vol. 61, no. 3, pp. 611–622, 1999.
[13]
J. Görtler, T. Spinner, D. Streeb, D. Weiskopf, and O. Deussen, “Uncertainty-aware principal component analysis,” IEEE Trans. Vis. Comput. Graph., vol. 26, no. 1, pp. 822–831, Jan. 2020.
[14]
Y.-H. Chan, C. D. Correa, and K.-L. Ma, “Flow-based scatterplots for sensitivity analysis,” in Proc. IEEE Symp. Vis. Analytics Sci. Technol., 2010, pp. 43–50.
[15]
Y.-H. Chan, C. D. Correa, and K.-L. Ma, “The generalized sensitivity scatterplot,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 10, pp. 1768–1781, Oct. 2013.
[16]
R. Faust, D. Glickenstein, and C. Scheidegger, “DimReader: Axis lines that explain non-linear projections,” IEEE Trans. Vis. Comput. Graph., vol. 25, no. 1, pp. 481–490, Jan. 2019.
[17]
K. Pöthkow, B. Weber, and H.-C. Hege, “Probabilistic marching cubes,” Comput. Graph. Forum, vol. 30, no. 3, pp. 931–940, 2011.
[18]
T. Athawale and A. Entezari, “Uncertainty quantification in linear interpolation for isosurface extraction,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 12, pp. 2723–2732, Dec. 2013.
[19]
T. M. Athawale, B. Ma, E. Sakhaee, C. R. Johnson, and A. Entezari, “Direct volume rendering with nonparametric models of uncertainty,” IEEE Trans. Vis. Comput. Graph., vol. 27, no. 2, pp. 1797–1807, Feb. 2021.
[20]
D. Spiegelhalter, M. Pearson, and I. Short, “Visualizing uncertainty about the future,” Science, vol. 333, no. 6048, pp. 1393–1400, 2011.
[21]
P. Levontin and J. L. Walton, Visualising Uncertainty a Short Introduction. U.K.: Sad Press and Friends, 2020.
[22]
S. Deitrick, “Evaluating implicit visualization of uncertainty for public policy decision support,” in Proc. AutoCarto, 2012.
[23]
C. Kinkeldey, A. M. MacEachren, and J. Schiewe, “How to assess visual communication of uncertainty? A systematic review of geospatial uncertainty visualisation user studies,” Cartographic J., vol. 51, no. 4, pp. 372–386, 2014.
[24]
M. Matthews, L. Rehak, J. Famewo, T. Taylor, J. Robson, and Humansystems Inc Guelph (ONTARIO), “Evaluation of new visualization approaches for representing uncertainty in the recognized maritime picture,” DRDC Atlantic CR, vol. 177,2008.
[25]
M. Skeels, B. Lee, G. Smith, and G. G. Robertson, “Revealing uncertainty for information visualization,” Inf. Visual., vol. 9, no. 1, pp. 70–81, 2010.
[26]
D. Weiskopf, “Uncertainty visualization: Concepts, methods, and applications in biological data visualization,” Front. Bioinf., vol. 2, 2022, Art. no.
[27]
C. Schulz, A. Nocaj, J. Goertler, O. Deussen, U. Brandes, and D. Weiskopf, “Probabilistic graph layout for uncertain network visualization,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 1, pp. 531–540, Jan. 2017.
[28]
J. Hullman, P. Resnick, and E. Adar, “Hypothetical outcome plots outperform error bars and violin plots for inferences about reliability of variable ordering,” PLoS One, vol. 10, no. 11, 2015, Art. no. [Online]. Available: http://idl.cs.washington.edu/papers/hops
[29]
D. Zhang, E. Adar, and J. Hullman, “Visualizing uncertainty in probabilistic graphs with network hypothetical outcome plots (NetHOPs),” IEEE Trans. Vis. Comput. Graph., vol. 28, no. 1, pp. 443–453, Jan. 2022.
[30]
P. Hennig, “Animating samples from Gaussian distributions,” Max Planck Institute for Intelligent Systems, Spemannstraße, 72076 Tübingen, Germany, Tec. Rep. 8, Sep. 2013.
[31]
R. L. Iman and J. C. Helton, “An investigation of uncertainty and sensitivity analysis techniques for computer models,” Risk Anal., vol. 8, no. 1, pp. 71–90, 1988.
[32]
S. H. Lee and W. Chen, “A comparative study of uncertainty propagation methods for black-box-type problems,” Struct. Multidisciplinary Optim., vol. 37, no. 3, pp. 239–253, 2009.
[33]
H. Janssen, “Monte-Carlo based uncertainty analysis: Sampling efficiency and sampling convergence,” Rel. Eng. Syst. Saf., vol. 109, pp. 123–132, 2013.
[34]
D. A. Pérez, H. Gietler, and H. Zangl, “Automatic uncertainty propagation based on the unscented transform,” in Proc. IEEE Int. Instrum. Meas. Technol. Conf., 2020, pp. 1–6.
[35]
R. Schenkendorf, “A general framework for uncertainty propagation based on point estimate methods,” in Proc. PHM Soc. Eur. Conf., 2014, vol. 2, no. 1.
[36]
B. Ochoa and S. Belongie, “Covariance propagation for guided matching,” in Proc. Workshop Stat. Methods Multi-Image Video Process., 2006.
[37]
H. Hotelling, “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol., vol. 24, no. 6, pp. 417–441, 1933.
[38]
S. Barratt, “A matrix Gaussian distribution,” 2018,.
[39]
A. K. Gupta and D. K. Nagar, Matrix Variate Distributions. Boca Raton, FL, USA: Chapman and Hall/CRC, 2018.
[40]
A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Automatic differentiation in machine learning: A survey,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 5595–5637, 2017.
[41]
M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. [Online]. Available: https://www.tensorflow.org/
[42]
A. Paszke et al., “Automatic differentiation in PyTorch,” in Proc. Int. Conf. Neural Inf. Process. Syst. Workshops, 2017.
[43]
D. Maclaurin, “Modeling, inference and optimization with composable differentiable procedures,” Ph.D. dissertation, Harvard University, 2016.
[44]
M. Seeger, A. Hetzel, Z. Dai, E. Meissner, and N. D. Lawrence, “Auto-differentiating linear algebra,” 2017,.
[45]
J. Bradbury et al., “JAX: Composable transformations of python NumPy programs,” 2018. [Online]. Available: http://github.com/google/jax
[46]
P. Hennig, M. A. Osborne, and H. P. Kersting, Probabilistic Numerics: Computation as Machine Learning. Cambridge, U.K.: Cambridge Univ. Press, 2022.
[47]
T. Denoeux and M.-H. Masson, “Principal component analysis of fuzzy data using autoassociative neural networks,” IEEE Trans. Fuzzy Syst., vol. 12, no. 3, pp. 336–349, Jun. 2004.
[48]
C. Higuera, K. J. Gardiner, and K. J. Cios, “Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome,” PLoS One, vol. 10, no. 6, 2015, Art. no.
[49]
R. Pearson, Pumadata: Various data sets for use with the puma package, 2020. [Online]. Available: http://umber.sbs.man.ac.uk/resources/puma
[50]
J. R. van Dorp and S. Kotz, “Generalized trapezoidal distributions,” Metrika, vol. 58, no. 1, pp. 85–97, 2003.
[51]
D. Dua and C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
[52]
R. D. Pearson, X. Liu, G. Sanguinetti, M. Milo, N. D. Lawrence, and M. Rattray, “puma: A bioconductor package for propagating uncertainty in microarray analysis,” BMC Bioinf., vol. 10, no. 1, pp. 1–10, 2009.
[53]
X. Liu, M. Milo, N. D. Lawrence, and M. Rattray, “A tractable probabilistic model for affymetrix probe-level analysis across multiple chips,” Bioinformatics, vol. 21, no. 18, pp. 3637–3644, 2005.
[54]
X. Li et al., “Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale,” Nature Genet., vol. 52, no. 9, pp. 969–983, 2020.
[55]
L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, 2008.
[56]
L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform manifold approximation and projection for dimension reduction,” 2018,.

Index Terms

  1. VIPurPCA: Visualizing and Propagating Uncertainty in Principal Component Analysis
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Visualization and Computer Graphics
          IEEE Transactions on Visualization and Computer Graphics  Volume 30, Issue 4
          April 2024
          170 pages

          Publisher

          IEEE Educational Activities Department

          United States

          Publication History

          Published: 21 December 2023

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 06 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media