Abstract
The data cluster tendency is an emerging need for exploring the big data cluster analysis tasks. The data are evaluated based on the number of clusters is known as cluster tendency. Many visualization techniques have been developed for the detection of cluster tendency. Some of the existing techniques include Visual Assessment Tendency (VAT), spectral-based VAT (SpecVAT), and improved VAT (iVAT), are considerably succeeded for an assessment of cluster tendency for small datasets. A bigVAT is another method that was recently developed for the estimation of cluster tendency of big data. It is perfect for deriving the clustering tendency in visual form for big data. However, it is intractable to explore the data clusters for large volumes of data objects. The proposed work addresses the clustering problem of bigVAT with the derivation of sampling-based crisp partitions. The crisp partitions will accurately predict the cluster labels of data objects. This research is based on big synthetic and big real-life datasets for demonstrating the performance efficiency of the proposed work.
Similar content being viewed by others
References
Bezdek, J.C., Hathaway, R.J.: VAT: a tool for visual assessment of (cluster) tendency. In: Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02, pp. 2225–2230 (2002)
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y.: A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE 10(12), e0144059 (2015)
Singh, S., Singh, N.: Big Data analytics. In: 2012 International Conference on Communication, Information & Computing Technology (ICCICT), Mumbai, 2012, pp. 1–4, https://doi.org/10.1109/ICCICT.2012.6398180.
Suleman Basha, M., Mouleeswaran, S.K., Rajendra Prasad, K.: Cluster tendency methods for visualizing the data partitions. International Journal of Innovative Technology & Exploring Engineering, 2019
Esteves, R.M., Hacker, T., Rong, C.: Competitive K-means, a new accurate and distributed K-means algorithm for large datasets. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science, Bristol, 2013, pp. 17–24. https://doi.org/10.1109/CloudCom.2013.89.
Kumar, D., Bezdek, J.C., Palaniswami, M., Rajasegarar, S., Leckie, C., Havens, T.C.: A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10), 2372–2385 (2016)
Rajendra Prasad, K., Mohammed, M. & Noorullah, R.M. Visual topic models for healthcare data clustering. Evol. Intel. (2019). https://doi.org/https://doi.org/10.1007/s12065-019-00300-y
Taghva, K., Veni, R.: Effects of similarity metrics on document clustering. In: 2010 Seventh International Conference on Information Technology: New Generations, Las Vegas, NV, 2010, pp. 222–226, https://doi.org/10.1109/ITNG.2010.65.
Leonori, S., Martino, A., Mascioli, F.M.F., Rizzi, A.: ANFIS microgrid energy management system synthesis by hyperplane clustering supported by neurofuzzy min–max classifier. IEEE Trans. Emerg. Top. Comput. Intell. 3(3), 193–204 (2019)
Rajendra Prasad, K., Mohammed, M., Noorullah, : Hybrid topic cluster models for social Healthcare Data. Int. J. Adv. Comput. Sci. Appl. 10(11), 490–506 (2019)
Rathore, P., Kumar, D., Bezdek, J.C., Rajasegarar, S., Palaniswami, M.: A rapid hybrid clustering algorithm for large volumes of high dimensional data. IEEE Trans Knowledge Data Eng 31(4), 641–654 (2019). https://doi.org/10.1109/TKDE.2018.2842191
Havens, T.C., Bezdek, J.C.: An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24(5), 813–822 (2012). https://doi.org/10.1109/TKDE.2011.33
Bezdek, J.L.: SpecVAT: Enhanced visual cluster analysis. In: IEEE International Conference on Data Mining, ICDM (2008)
Denton, P., Parke, S., Tao, T., Zhang, X.: Eigenvectors from eigenvalues. arXiv. 1908, 03795 (2019)
Huband, J.M., Bezdek, J.C., Hathaway, R.J.: bigVAT: Visual assessment of cluster tendency for large data set. Pattern Recogn. 38(11), 1875–1886 (2005)
Bhatnagar, V., Majhi, R., Jena, P.R.: Comparative performance evaluation of clustering algorithms for grouping manufacturing firms. Arab J Sci Eng 43, 4071–4083 (2018)
Eswara Reddy, B., Rajendra Prasad, K.: Reducing runtime values in minimum spanning tree based clustering by visual access tendency. Int. J. Data Min. Knowl. Manag. Process 2(3), 11–22 (2012)
Lin, Y.S., Jiang, J.Y., Lee, S.J.: A similarity measure for text classification and clustering. IEEE Trans. Knowl. Data Eng. 26(7), 1575–1590 (2013)
Chow, T.W.S., Huang, D.: Data reduction for pattern recognition and data analysis. In: Fulcher, J., Jain, L.C. (eds) Computational Intelligence: A Compendium. Studies in Computational Intelligence, vol 115. Springer, Berlin (2008)
Shengxi, P., Jianguo, L., Jiaxiong, P., Wang, G.: The design and implementation of dip arrow plot pattern recognition system. In: [1988 Proceedings] 9th International Conference on Pattern Recognition, vol. 2, Rome, Italy, pp. 703–705. (1988). https://doi.org/10.1109/ICPR.1988.28333.
Tariq, A., Foroosh, H.: T-clustering: Image clustering by tensor decomposition. In: 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, 2015, pp. 4803–4807. https://doi.org/10.1109/ICIP.2015.7351719.
Ji, Y., Wang, L., Wu, W., Shao, H., Feng, Y.: A method for LSTM-based trajectory modeling and abnormal trajectory detection. IEEE Access 8, 104063–104073 (2020). https://doi.org/10.1109/ACCESS.2020.2997967
Rajendra Prasad, K., Suleman Basha, M.: Improving the performance of speech clustering method. In: IEEE- 10th International Conference on Intelligent Systems and Control (ISCO) (2016)
Mahallati, S., Bezdek, J.C., Kumar, D., Popovic, M.R., Valiante, T.A.: Interpreting cluster structure in waveform data with visual assessment and Dunn’s index. InFrontiers in Computational Intelligence 2018 (pp. 73–101). Springer, Cham
Rajendra Prasad, K., Suleman Basha, M., Rama Subbaia, B.: Speech clustering analysis by multi viewpoints cosine based similarity. Int. J. Pure Appl. Math. 116(21), 235–241 (2017)
https://archive.ics.uci.edu/ml/support/Pen-Based+Recognition+of+Handwritten+Digits
Pattanodom, M., I am-On, N., Boongoen, T.: Clustering data with the presence of missing values by ensemble approach. In: 2016 Second Asian Conference on Defense Technology (ACDT). https://doi.org/10.1109/acdt.2016.7437660
Alessia, A., Pizzuti, C.: Is normalized mutual information a fair measure for comparing community detection methods? In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2015).
Xu, G., Meng, Y., Chen, Z., Qiu, X., Wang, C., Yao, H.: Research on topic detection and tracking for online news texts. IEEE Access 7, 58407–58418 (2019)
Gulnashin F., Sharma I., Sharma H. (2019) A new deterministic method of initializing spherical K-means for document clustering. In: Pati B., Panigrahi C., Misra S., Pujari A., Bakshi S. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 713. Springer, New York
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data. Sci. 2, 165–193 (2015)
Hitendra Sarma, T., Viswanath, P., Eswara Reddy, B.: Single pass k-means clustering method. Sadhana, Vol. 38, Part. 3, 407–419, Springer (2013)
Funding
This study was funded by Science and Engineering Research Board (Grant No. ECR/2016/001556).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rajendra Prasad, K., Mohammed, M., Narasimha Prasad, L.V. et al. An efficient sampling-based visualization technique for big data clustering with crisp partitions. Distrib Parallel Databases 39, 813–832 (2021). https://doi.org/10.1007/s10619-021-07324-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-021-07324-3