Abstract
The data flow is an important parameter used in the optimization problem of Wireless Sensor Networks. This paper presents an expert approach for improved data flow prediction based on data discretization and artificial intelligence. The proposed approach has been implemented on various machine learning methods (a total of 17 methods). This data flow prediction is based on the dataset generated from the simulations with NS-2.35 for multiple Wireless Sensor Networks (5- to -50 nodes). The performance comparison of different machine learning models with continuous data and discretized data is also presented. The proposed approach considerably reduces the execution time of the machine learning models for training purposes and also enhances the accuracy of prediction. The result analysis shows that the proposed approach is better compared to various machine learning methods. Also, the proposed approach is able to handle both continuous and discrete data. The datasets used in this work are available as a supplement at NDS and DDS link.
Similar content being viewed by others
References
Berka, P., & Bruha, I. (1998). Discretization and grouping: Preprocessing steps for data mining. In European symposium on principles of data mining and knowledge discovery (pp. 239–245). Berlin: Springer
Shah, S. B., Chen, Z., Yin, F., Khan, I. U., & Niqash, A. (2018). Energy and interoperable aware routing for throughput optimization in clustered iot-wireless sensor networks. Future Generation Computer Systems, 81, 372–381.
Bhushan, B., & Sahoo, G. (2018). Recent advances in attacks, technical challenges, vulnerabilities and their countermeasures in wireless sensor networks. Wireless Personal Communications, 98(2), 2037–2077.
Issariyakul, T., & Hossain, E. (2011). Introduction to network simulator NS2. Berlin: Springer.
Swain, R. R., & Khilar, P. M. (2017). Composite fault diagnosis in wireless sensor networks using neural networks. Wireless Personal Communications, 95(3), 2507–2548.
Elghazel, W., Bahi, J., Guyeux, C., Hakem, M., Medjaher, K., & Zerhouni, N. (2015). Dependability of wireless sensor networks for industrial prognostics and health management. Computers in Industry, 68, 1–15.
Sriwanna, K., Boongoen, T., & Iam-On, N. (2017). Graph clustering-based discretization of splitting and merging methods (graphs and graphm). Human-Centric Computing and Information Sciences, 7(1), 21.
Panda, M. (2017). Intelligent data analysis for sustainable smart grids using hybrid classification by genetic algorithm based discretization. Intelligent Decision Technologies, 11(2), 137–151.
Gallo, C. A., Cecchini, R. L., Carballido, J. A., Micheletto, S., & Ponzoni, I. (2016). Discretization of gene expression data revised. Briefings in Bioinformatics, 17(5), 758–770.
Santoni, D., Weitschek, E., & Felici, G. (2016). Optimal discretization and selection of features by association rates of joint distributions. RAIRO-Operations Research, 50(2), 437–449.
Gómez, I., Ribelles, N., Franco, L., Alba, E., & Jerez, J. M. (2016). Supervised discretization can discover risk groups in cancer survival analysis. Computer Methods and Programs in Biomedicine, 136, 11–19.
Sarkar, S., & Srivastav, A. (2016). A composite discretization scheme for symbolic identification of complex systems. Signal Processing, 125, 156–170.
Moharir, S., Sanghavi, S., & Shakkottai, S. (2015). Online load balancing under graph constraints. IEEE/ACM Transactions on Networking, 24(3), 1690–1703.
García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2018). Principal components analysis random discretization ensemble for big data. Knowledge-Based Systems, 150, 166–174.
Lopes, L. A., Machado, V. P., Rabêlo, R. A. L., Fernandes, R. A. S., & Lima, B. V. A. (2016). Automatic labelling of clusters of discrete and continuous data with supervised machine learning. Knowledge-Based Systems, 106, 231–241.
Arnaiz-González, Á., Díez-Pastor, J. F., Rodríguez, J. J., & García-Osorio, C. I. (2016). Instance selection for regression by discretization. Expert Systems with Applications, 54, 340–350.
de Sá, C. R., Soares, C., & Knobbe, A. (2016). Entropy-based discretization methods for ranking data. Information Sciences, 329, 921–936.
Bruni, R., & Bianchi, G. (2015). Effective classification using a small training set based on discretization and statistical analysis. IEEE Transactions on Knowledge and Data Engineering, 27(9), 2349–2361.
Jung, S., Bi, Y., & Davuluri, R. V. (2015). Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping. BMC Genomics, 16(11), S3.
Jankowski, C., Reda, D., Mańkowski, M., & Borowik, G. (2015). Discretization of data using boolean transformations and information theory based evaluation criteria. Bulletin of the Polish Academy of Sciences Technical Sciences, 63(4), 923–932.
Ferreira, A. J., & Figueiredo, M. A. T. (2014). Incremental filter and wrapper approaches for feature discretization. Neurocomputing, 123, 60–74.
Sang, Y., Qi, H., Li, K., Jin, Y., Yan, D., & Gao, S. (2014). An effective discretization method for disposing high-dimensional data. Information Sciences, 270, 73–91.
Cangelosi, D., Muselli, M., Parodi, S., Blengio, F., Becherini, P., Versteeg, R., et al. (2014). Use of attribute driven incremental discretization and logic learning machine to build a prognostic classifier for neuroblastoma patients. BMC Bioinformatics, 15(5), S4.
Lustgarten, J. L., Gopalakrishnan, V., Grover, H., & Visweswaran, S. (2008). Improving classification performance with discretization on biomedical datasets. In AMIA annual symposium proceedings (vol. 2008, p. 445). American Medical Informatics Association
Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1), 245–271.
Chmielewski, M. R., & Grzymala-Busse, J. W. (1996). Global discretization of continuous attributes as preprocessing for machine learning. International Journal of Approximate Reasoning, 15(4), 319–331.
Salleb-Aouissi, A., Vrain, C., Nortet, C., Kong, X., Rathod, V., & Cassard, D. (2013). Quantminer for mining quantitative association rules. Journal of Machine Learning Research, 14(1), 3153–3157.
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. Cambridge: MIT Press.
Kulakov, A., Davcev, D., & Trajkovski, G. (2005). Implementing artificial neural-networks in wireless sensor networks. In IEEE/Sarnoff symposium on advances in wired and wireless communication (pp. 94–97). IEEE.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.
Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In IEEE international conference on artificial neural networks (pp. 586–591).
Faraway, J. J. (2016). Extending the linear model with R: Generalized linear, mixed effects and nonparametric regression models (Vol. 124). Boca Raton: CRC Press.
Narayanan, S. J., Bhatt, R. B., & Perumal, B. (2016). Improving the accuracy of fuzzy decision tree by direct back propagation with adaptive learning rate and momentum factor for user localization. Procedia Computer Science, 89, 506–513.
Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., & Kadie, C. (2000). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1(Oct), 49–75.
Rulequest: Data Mining with Cubist. (www.rulequest.com/cubist-info.html).
Tang, J., Deng, C., & Huang, G.-B. (2016). Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems, 27(4), 809–821.
Liu, D., & Fan, S. (2014). A modified decision tree algorithm based on genetic algorithm for mobile user classification problem. The Scientific World Journal, 2014(1), 1–11.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Gan, T. Y., Kalinga, O., & Singh, P. (2009). Comparison of snow water equivalent retrieved from SSM/I passive microwave data using artificial neural network, projection pursuit and nonlinear regressions. Remote Sensing of Environment, 113(5), 919–927.
Ticknor, J. L. (2013). A bayesian regularized artificial neural network for stock market forecasting. Expert Systems with Applications, 40(14), 5501–5506.
Hothorn, T., & Zeileis, A. (2015). partykit: A modular toolkit for recursive partytioning in R. The Journal of Machine Learning Research, 16(1), 3905–3909.
Jagannathan, R. (2016). A linear regression approach for determining explicit expressions for option prices for equity option pricing models with dependent volatility and return processes. Journal of Mathematical Finance, 6(02), 303.
Cios, K. J., Pedrycz, W., & Swiniarski, R. W. (1998). Data mining and knowledge discovery. In Data mining methods for knowledge discovery (pp. 1–26). Boston: Springer.
Mannor, S., & Meir, R. (2002). On the existence of linear weak learners and applications to boosting. Machine Learning, 48(1–3), 219–251.
Baadache, A., & Belmehdi, A. (2012). Fighting against packet dropping misbehavior in multi-hop wireless ad hoc networks. Journal of Network and Computer Applications, 35(3), 1130–1139.
Alsheikh, M. A., Lin, S., Niyato, D., & Tan, H. P. (2014). Machine learning in wireless sensor networks: Algorithms, strategies, and applications. IEEE Communications Surveys & Tutorials, 16(4), 1996–2018.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest. The article discusses about machine learning based discretization techniques for network analysis.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Supplementary Information
The correlation values for NDS and DDS is shown in Table 5 :
The correlation for various dataset partitions is presented in Fig. 10:
The Coefficient of Determination values for NDS and DDS is shown in Table 6 :
The coefficient of determination for various dataset partitions is presented in Fig. 11:
The Root Mean Square Error (RMSE) values for NDS and DDS is shown in Table 7 :
The RMSE for various dataset partitions is presented in Fig. 12:
The accuracy values for NDS and DDS is shown in Table 8 :
The accuracy for various dataset partitions is presented in Fig. 13:
The time taken for NDS and DDS is shown in Table 9 :
The time taken for various dataset partitions is presented in Fig. 14:
Rights and permissions
About this article
Cite this article
Sandhu, J.K., Verma, A.K. & Rana, P.S. An Expert Approach for Data Flow Prediction: Case Study of Wireless Sensor Networks. Wireless Pers Commun 112, 325–352 (2020). https://doi.org/10.1007/s11277-020-07028-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-020-07028-4