[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Scatterplot selection for dimensionality reduction in multidimensional data visualization

  • Regular Paper
  • Published:
Journal of Visualization Aims and scope Submit manuscript

Abstract

Dimensionality reduction (DR) techniques for multidimensional data serve as powerful tools for visualization and understanding of the structure of the data. Various DR methods have been developed to extract specific features of the data over the years. However, selection of the optimal DR method and fine-tuning parameters are still challenging, as these choices vary based on the characteristics of the dataset. Consequently, data scientists often rely on their experience or undertake extensive experimentation to identify the most suitable approach. This paper proposes a semi-automatic method for selecting appropriate DR techniques through scatterplot evaluation. Initially, our approach applies a range of DR methods to the given multidimensional data to compute two-dimensional values. Next, we generate scatterplots from the two-dimensional data and calculate scores reflecting the distribution and spatial relationships among the points. Scatterplots that provide insights achieve higher scores, enabling an efficient selection of DR methods based on their visualization. We demonstrate the effectiveness of the presented method through two case studies: The first one is an e-commerce review dataset, and the second focuses on a dataset derived from music feature extraction.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Agis D, Pozo F (2019) A frequency-based approach for the detection and classification of structural changes using t-sne. Sensors 19(23):5097

    Article  MATH  Google Scholar 

  • Anowar F, Sadaoui S, Selim B (2021) Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Comput Sci Rev 40:100378

    Article  MathSciNet  MATH  Google Scholar 

  • Aupetit M, Sedlmair M (2016) Sepme: 2002 new visual separation measures. In: 2016 IEEE pacific visualization symposium (PacificVis), pp. 1–8. IEEE

  • Ayesha S, Hanif MK, Talib R (2020) Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inf Fus 59:44–58

    Article  MATH  Google Scholar 

  • Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  • Dang TN, Wilkinson L (2014) Scagexplorer: Exploring scatterplots by their scagnostics. In: 2014 IEEE Pacific visualization symposium, pp 73–80. IEEE

  • Engel D, Hüttenberger L, Hamann B (2012) A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: Visualization of large and unstructured data sets: applications in geospatial planning, modeling and engineering-proceedings of IRTG 1131 Workshop 2011. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik

  • Fodor IK (2002) A survey of dimension reduction techniques. Technical report, Lawrence Livermore National Lab., CA (US)

  • Gao T (2021) Simcse: simple contrastive learning of sentence embeddings

  • Gao T, Yao X, Chen D (2021) Simcse: simple contrastive learning of sentence embeddings. arXiv preprint. arXiv:2104.08821

  • Harrison L, Yang F, Franconeri S, Chang R (2014) Ranking visualizations of correlation using weber’s law. IEEE Trans Visual Comput Graph 20(12):1943–1952

    Article  MATH  Google Scholar 

  • Nadia Syed HS, Jamil NW (2023) A comparative study of hybrid dimension reduction techniques to enhance the classification of high-dimensional microarray data. In: 2023 IEEE 11th conference on systems, process & control (ICSPC), pp 240–245

  • Heiser CN, Lau KS (2020) A quantitative framework for evaluating single-cell data structure preservation by dimensionality reduction techniques. Cell Rep 31(5)

  • Huang H, Wang Y, Rudin C, Browne EP (2022) Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization. Commun Biol 5(1):719

  • Huang S, Ward MO, Rundensteiner EA (2005) Exploration of dimensionality reduction for text visualization. In: Coordinated and multiple views in exploratory visualization (CMV’05), pp 63–74. IEEE

  • Itoh T, Nakabayashi A, Hagita M (2023) Multidimensional data visualization applying a variety-oriented scatterplot selection technique. J Visual 26(1):199–210

    Article  MATH  Google Scholar 

  • Lee JH, McDonnell KT, Zelenyuk A, Imre D, Mueller K (2013) A structure-based distance metric for high-dimensional space exploration with multidimensional scaling. IEEE Trans Visual Comput Graph 20(3):351–364

    Article  MATH  Google Scholar 

  • Malik HK, Al-Anber NJ, Al-Mekhlafi FAE (2023) Comparison of feature selection and feature extraction role in dimensionality reduction of big data. J Tech 5(1):184–192

    MATH  Google Scholar 

  • Matute J, Telea AC, Linsen L (2017) Skeleton-based scagnostics. IEEE Trans Visual Comput Graph 24(1):542–552

    Article  MATH  Google Scholar 

  • Nanga S, Bawah AT, Acquaye BA, Billa MI, Baeta FD, Odai NA, Obeng SK, Nsiah AD (2021) Review of dimension reduction methods. J Data Anal Inf Process 9(3):189–231

    Google Scholar 

  • Ni J (2018) Amazon review data

  • Padron-Manrique C, Vázquez-Jiménez A, Esquivel-Hernandez DA, Martinez Lopez YE, Neri-Rosario D, Sánchez-Castañeda JP, Giron-Villalobos D, Resendis-Antonio O (2022) Diffusion on PCA-UMAP manifold captures a well-balance of local, global, and continuum structure to denoise single-cell RNA sequencing data. bioRxiv, pp 2022–06

  • Remeseiro B, Bolon-Canedo V (2019) A review of feature selection methods in medical applications. Comput Biol Med 112:103375

    Article  MATH  Google Scholar 

  • Saini O, Sharma S (2018) A review on dimension reduction techniques in data mining. Comput Eng Intell Syst 9(1):7–14

    MATH  Google Scholar 

  • Sedlmair M, Tatu A, Munzner T, Tory M (2012) A taxonomy of visual cluster separation factors. In: Computer graphics forum, vol 31, pp 1335–1344. Wiley Online Library

  • Singh KN, Devi SD, Devi HM, Mahanta AK (2022) A novel approach for dimension reduction using word embedding: an enhanced text classification approach. Int J Inf Manage Data Insights 2(1):100061

  • Sips M, Neubert B, Lewis JP, Hanrahan P (2009) Selecting good views of high-dimensional data using class consistency. In: Computer graphics forum, vol 28, pp 831–838. Wiley Online Library

  • Stolarek I, Samelak-Czajka A, Figlerowicz M, Jackowiak P (2022) Dimensionality reduction by umap for visualizing and aiding in classification of imaging flow cytometry data. Iscience 25(10)

  • Van Der Maaten L, Postma EO, van den Herik HJ et al (2009) Dimensionality reduction: a comparative review. J Mach Learn Res 10(66-71):13

  • Vashisth P, Meehan K (2020) Gender classification using twitter text data. In: 2020 31st Irish signals and systems conference (ISSC), pp 1–6. IEEE

  • Wang K, Yang Y, Fangjiang W, Song B, Wang X, Wang T (2023) Comparative analysis of dimension reduction methods for cytometry by time-of-flight data. Nat Commun 14(1):1836

    Article  MATH  Google Scholar 

  • Wang Y, Wang Z, Liu T, Correll M, Cheng Z, Deussen O, Sedlmair M (2019) Improving the robustness of scagnostics. IEEE Trans Visual Comput Graph 26(1):759–769

    Article  MATH  Google Scholar 

  • Wien T (2015) Music information retrieval

  • Wilkinson L, Anand A, Grossman R (2005) Graph-theoretic scagnostics. In: Information visualization, IEEE symposium on, pp 21–21. IEEE Computer Society

  • Yamada I, Asai A, Sakuma J, Shindo H, Takeda H, Takefuji Y, Matsumoto Y (2018) Wikipedia2vec: an efficient toolkit for learning and visualizing the embeddings of words and entities from wikipedia. arXiv preprint. arXiv:1812.06280

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaya Okada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Okada, K., Itoh, T. Scatterplot selection for dimensionality reduction in multidimensional data visualization. J Vis 28, 205–221 (2025). https://doi.org/10.1007/s12650-024-01025-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12650-024-01025-6

Keywords