Abstract
Big data is the recent imminent technology, which can provide large benefits to the business administration. Owing to such huge volume, it becomes very complicated to ensure effective analysis by the existing techniques. The complications can be related to analyze, capture, sharing, storage, and visualization of the data. To tackle these challenges, a novel classification technique using Holoentropy based Correlative Naive Bayes classifier and MapReduce Model (HCNB-MRM) is proposed. The proposed HCNB, which is designed by combining the Holoentropy function with the correlative based Naive Bayes classifier deals with both high-dimensional data sets as well as extensive datasets to improve the benchmark, and classify the data based on dependent assumption. Therefore, the proposed HCNB-MRM is used to make the process simpler and to choose the best features from big dataset. The proposed HCNB with the MapReduce Model maximizes the performance of big data classification using probability index table, and posterior probability of the testing data samples. The performance of the proposed HCNB-MRM is evaluated using three metrics, such as accuracy, sensitivity, and specificity. From the experimental results, it is analyzed that the proposed HCNB-MRM obtains a high classification accuracy of 93.5965% and 94.3369% for the localization dataset, and skin dataset when compared with the existing techniques.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bechini A, Marcelloni F, Segatori A (2016) A MapReduce solution for associative classification of big data. Inf Sci 332:33–55
Priyadarshini A, Agarwal S (2015) A map-reduce based support vector machine for big data classification. Int J Database Theory Appl 8(5):77–98
Deng Z, Zhu X, Cheng D, Zong M, Zhang S (2016) Efficient kNN classification algorithm for big data. Neurocomputing 195:143–148
Elkano M, Galar M, Sanz J, Bustince H (2018) CHI-BD: a fuzzy rule-based classification system for big data classification problems. Fuzzy Sets Syst 348:75–101
Benabderrahmane S, Mellouli N, Lamolle M, Paroubek P (2017) Smart4Job: a big data framework for intelligent job offers broadcasting using time series forecasting and semantic classification. Big Data Res 7:16–30
Fong S, Wong R, Vasilakos AV (2016) Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput 9(1):33–45
Lin K-C, Zhang K-Y, Huang Y-H, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72(8):3210–3221
Read J, Bifet A (2015) Data stream classification using random feature functions and novel method combinations. In: Proceedings in 2015 IEEE Trustcom/BigDataSE/ISPA, vol 2, pp 211–216
Triguero I, Peralta D, Bacardit J, Garcia S, Herrera F (2015) MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing 150(20A):331–345
Hazewinkel M (2001) Arithmetic series. In: Hazewinkel M (ed) Encyclopedia of mathematics. Springer, Netherlands
Garren ST (1998) Maximum likelihood estimation of the correlation coefficient in a bivariate normal model with missing data. Stat Probab Lett 38(3):281–288
Shu W, Wang S (2013) Information-theoretic outlier detection for large-scale categorical data. IEEE Trans Knowl Data Eng 25(3):589–602
Lampi J (2014) Large-scale distributed data management and processing using R, Hadoop and MapReduce. University of Oulu, Department of Computer Science and Engineering, Master’s Thesis
Gantz J, Reinsel D (2012) The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC iView IDC Anal Future 2007:1–16
Hu H, Wen Y, Chua TS, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–687
Marx V (2013) The big challenges of big data. Nature 7453(498):255–260
Minelli M, Chambers M, Dhiraj A (2013) Big data, big analytics: emerging business intelligence and analytic trends for today’s businesses (Wiley CIO)”, 1st edn. Wiley, New York
Plummer D, Bittman T, Austin T, Cearley D, Cloud DS (2008) Defining and describing an emerging phenomenon. Technical report
Alpaydin E (2010) Introduction to machine learning, 2nd edn. MIT Press, Cambridge
Woniak M, Granaa M, Corchado E (2013) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17
Xu K, Wen C, Yuan Q, He X, Tie J (2014) A MapReduce based parallel SVM for email classification. J Netw 9(6):1640–1647
Prasad BR, Agarwal S (2014) Handling big data stream analytics using SAMOA framework-a practical experience. Int J Database and Appl 7(4):197–208
Dean Jeffrey, Ghemawat Sanjay (2008) MapReduce: simplified data processing on large clusters. ACM Commun 51(1):107–113
Banchhor C, Srinivasu N (2016) CNB-MRF: adapting correlative Naive Bayes classifier and MapReduce framework for big data classification. Int Rev Comput Softw (IRECOS) 11(11):1007–1015
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37
UCI machine learning repository from http://archive.ics.uci.edu/ml/. Accessed on Nov 2017
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Ezatpoor P, Zhan J, Wu JMT, Chiu C (2018) Finding Top-k dominance on incomplete big data using MapReduce framework. IEEE Access 6:7872–7887
Dhyani P, Chander S, Vijaya P (2016) DOFL: kernel based directive operative fractional line optimization algorithm for data clustering. Int Rev Comput Softw (IRECOS) 11(8):701
Thomas R, Rangachar MJS (2016) Integrating GWTM and BAT algorithm for face recognition in low-resolution images. Imaging Sci J 64(8):441–452
Ingle RB, More NS (2018) Energy-aware VM migration using Dragonfly–Crow optimization and support vector regression model in Cloud Data. Int J Model Simul Sci Comput. https://doi.org/10.1142/S1793962318500502
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Banchhor, C., Srinivasu, N. Holoentropy based Correlative Naive Bayes classifier and MapReduce model for classifying the big data. Evol. Intel. 15, 1037–1050 (2022). https://doi.org/10.1007/s12065-019-00276-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-019-00276-9