ivhd: A Robust Linear-Time and Memory Efficient Method for Visual Exploratory Data Analysis

Witold Dzwine¹⁴ &
Rafał Wcisło¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

3939 Accesses

Abstract

Data embedding (DE) and graph visualization (GV) methods are very compatible tools used in Exploratory Data Analysis for visualization of complex data such as high-dimensional data and complex networks. However, high computational complexity and memory load of existing DE and GV algorithms, considerably hinders visualization of truly large and big data consisting of as many as M~10⁶⁺ data objects and N~10³⁺ dimensions. Recently, we have shown that by employing only a small fraction of distances between data objects one can obtain very satisfactory reconstruction of topology of a complex data in 2D in a linear-time O(M). In this paper, we demonstrate the high robustness of our approach. We show that even poor approximations of the nn-nearst neighbor graph, representing high-dimensional data, can yield acceptable data embeddings. Furthermore, some incorrectness in the nearest neighbor list can often be useful to improve the quality of data visualization. This robustness of our DE method, together with its high memory and time efficiency, meets perfectly the requirements of big and distributed data visualization, when finding the accurate nearest neighbor list represents a great computational challenge.

13^th International Conference on Machine Learning and Data Mining MLDM, New York, July 15-20, 2017.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Can Embedding Solve Scalability Issues for Mixed-Data Graph Clustering?

Visualization of very large high-dimensional data sets as minimum spanning trees

Article Open access 12 February 2020

GPU-Embedding of kNN-Graph Representing Large and High-Dimensional Data

References

Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on World Wide Web, pp. 287–297 (2016)
Google Scholar
Johnson, W.P., Glenn, J.M.: Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd edn. (2014)
Google Scholar
Pawliczek, P., Dzwinel, W., Yuen, D.A.: Visual exploration of data by using multidimensional scaling on multi-core CPU, GPU and MPI cluster. Concurrency Comput. Pract. Experience 26(3), 662–682 (2014)
Google Scholar
van der Maaten, L., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)
Google Scholar
Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems, pp. 833–840 (2002)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2011)
Google Scholar
van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014)
Google Scholar
Zhirong, Y., Peltonen, J., Kaski, S.: Optimization equivalence of divergences improves neighbor embedding. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, China (2014)
Google Scholar
Ingram, S., Munzner, T.: Dimensionality reduction for documents with nearest neighbor queries. Neurocomputing 150, 557–569 (2015)
Google Scholar
Pezzotti, N., Höllt, T., Lelieveldt, B., Eisemann,E., Vilanova, A.: Hierarchical stochastic neighbor embedding. Comput. Graph. Forum. 35(3), 21–30 (2016)
Google Scholar
Hu, Y., Lei, S.: Visualizing large graphs. Wiley Interdisc. Rev. Comput. Stat. 7(2), 15–136 (2015)
Google Scholar
Dzwinel, W., Wcisło, R.: Very fast interactive visualization of large sets of high-dimensional data. Procedia Comput. Sci. 51, 572–581 (2015)
Google Scholar
Dzwinel, W., Wcisło, R., Czech, W.: ivga: a fast force-directed method for interactive visualization of complex networks. J. Comput. Sci. (2016). in print, available on-line
Google Scholar
Borcea, C., Streinu, I.: The number of embeddings of minimally rigid graphs. Discrete Comput. Geom. 31(2), 287–303 (2004)
Google Scholar
Muja, M., David G.L.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Google Scholar
Lee, K.M.: Locality-sensitive hashing techniques for nearest neighbor search. Int. J. Fuzzy Logic Intell. Syst. 12(4), 300–307 (2012)
Google Scholar
Dzwinel, W., Wcisło, R., Matwin, S.: ivhd: a fast and simple algorithm for embedding large and high-dimensional data, working version available (2017). www.researchgate.net, doi:10.13140/RG.2.2.28959.15520/1
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: International Conference on Database Theory, pp. 217–235 (1999)
Google Scholar

Download references

Acknowledgments

This research is supported by the Polish National Center of Science (NCN) DEC-2013/09/B/ST6/01549.

Author information

Authors and Affiliations

AGH University of Science and Technology, Kraków, Poland
Witold Dzwine & Rafał Wcisło

Authors

Witold Dzwine
View author publications
You can also search for this author in PubMed Google Scholar
Rafał Wcisło
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafał Wcisło .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Sachsen, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dzwine, W., Wcisło, R. (2017). ivhd: A Robust Linear-Time and Memory Efficient Method for Visual Exploratory Data Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-62416-7_25
Published: 02 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ivhd: A Robust Linear-Time and Memory Efficient Method for Visual Exploratory Data Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Can Embedding Solve Scalability Issues for Mixed-Data Graph Clustering?

Visualization of very large high-dimensional data sets as minimum spanning trees

GPU-Embedding of kNN-Graph Representing Large and High-Dimensional Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

ivhd: A Robust Linear-Time and Memory Efficient Method for Visual Exploratory Data Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Can Embedding Solve Scalability Issues for Mixed-Data Graph Clustering?

Visualization of very large high-dimensional data sets as minimum spanning trees

GPU-Embedding of kNN-Graph Representing Large and High-Dimensional Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation