Abstract
Data embedding (DE) and graph visualization (GV) methods are very compatible tools used in Exploratory Data Analysis for visualization of complex data such as high-dimensional data and complex networks. However, high computational complexity and memory load of existing DE and GV algorithms, considerably hinders visualization of truly large and big data consisting of as many as M~106+ data objects and N~103+ dimensions. Recently, we have shown that by employing only a small fraction of distances between data objects one can obtain very satisfactory reconstruction of topology of a complex data in 2D in a linear-time O(M). In this paper, we demonstrate the high robustness of our approach. We show that even poor approximations of the nn-nearst neighbor graph, representing high-dimensional data, can yield acceptable data embeddings. Furthermore, some incorrectness in the nearest neighbor list can often be useful to improve the quality of data visualization. This robustness of our DE method, together with its high memory and time efficiency, meets perfectly the requirements of big and distributed data visualization, when finding the accurate nearest neighbor list represents a great computational challenge.
13th International Conference on Machine Learning and Data Mining MLDM, New York, July 15-20, 2017.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on World Wide Web, pp. 287–297 (2016)
Johnson, W.P., Glenn, J.M.: Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd edn. (2014)
Pawliczek, P., Dzwinel, W., Yuen, D.A.: Visual exploration of data by using multidimensional scaling on multi-core CPU, GPU and MPI cluster. Concurrency Comput. Pract. Experience 26(3), 662–682 (2014)
van der Maaten, L., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)
Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems, pp. 833–840 (2002)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2011)
van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014)
Zhirong, Y., Peltonen, J., Kaski, S.: Optimization equivalence of divergences improves neighbor embedding. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, China (2014)
Ingram, S., Munzner, T.: Dimensionality reduction for documents with nearest neighbor queries. Neurocomputing 150, 557–569 (2015)
Pezzotti, N., Höllt, T., Lelieveldt, B., Eisemann,E., Vilanova, A.: Hierarchical stochastic neighbor embedding. Comput. Graph. Forum. 35(3), 21–30 (2016)
Hu, Y., Lei, S.: Visualizing large graphs. Wiley Interdisc. Rev. Comput. Stat. 7(2), 15–136 (2015)
Dzwinel, W., Wcisło, R.: Very fast interactive visualization of large sets of high-dimensional data. Procedia Comput. Sci. 51, 572–581 (2015)
Dzwinel, W., Wcisło, R., Czech, W.: ivga: a fast force-directed method for interactive visualization of complex networks. J. Comput. Sci. (2016). in print, available on-line
Borcea, C., Streinu, I.: The number of embeddings of minimally rigid graphs. Discrete Comput. Geom. 31(2), 287–303 (2004)
Muja, M., David G.L.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Lee, K.M.: Locality-sensitive hashing techniques for nearest neighbor search. Int. J. Fuzzy Logic Intell. Syst. 12(4), 300–307 (2012)
Dzwinel, W., Wcisło, R., Matwin, S.: ivhd: a fast and simple algorithm for embedding large and high-dimensional data, working version available (2017). www.researchgate.net, doi:10.13140/RG.2.2.28959.15520/1
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: International Conference on Database Theory, pp. 217–235 (1999)
Acknowledgments
This research is supported by the Polish National Center of Science (NCN) DEC-2013/09/B/ST6/01549.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dzwine, W., Wcisło, R. (2017). ivhd: A Robust Linear-Time and Memory Efficient Method for Visual Exploratory Data Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-62416-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)