Open AccessArticle

Using DL Models in the Service Layer to Enhance the Fault Tolerance of IoT Networks

Sastry Kodanda Rama Jammalamadaka

^1,*

Bhupati Chokara

Sasi Bhanu Jammalamadaka

² and

Balakrishna Kamesh Duvvuri

Department of IoT, Koneru Lakshmaiah Deemed to be University, Vaddeswaram, Guntur 522501, India

Department of Computer Science, CMR College of Engineering and Technology, Hyderabad 501401, India

Department of Computer Science, MLR Institute of Technology, Hyderabad 500043, India

Author to whom correspondence should be addressed.

Electronics 2024, 13(22), 4334; https://doi.org/10.3390/electronics13224334

Submission received: 8 October 2024 / Revised: 30 October 2024 / Accepted: 1 November 2024 / Published: 5 November 2024

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning with RFID Technology for IoT)

Download

Browse Figures

Review Reports Versions Notes

Abstract

In an IoT network, the networked servers form a service layer, providing services to the users and the devices. The request to the service servers is routed through the gateway on one side of the services layer and the networked controllers on the other side. Data are transported from the sensors/devices through cluster heads en route to base stations and the controllers to the service servers, where the data are processed and sent for storage in the cloud through gateways. When any device is broken down or becomes non-operational, the inputs are not sensed, creating a gap in the data. The data transmitted from the devices would then become an incomplete flow; such data are not suitable for undertaking data analytics or predictions. The missing data must be first identified as the data flow and estimated or predicated to complete the data before they are transmitted through the cloud for storage and subsequent retrievals. This paper proposes a recurrent (RNN) neural network to predict the missing data. Two models are tested to predict the missing data: the multi-layer perceptron (MLP) model and a long short-term memory (LSTM)-based RNN model. The RNN-based model provides 99.66% accurate data prediction compared to other models.

Keywords:

prediction; missing data; multi-layer perceptron; long short-term memory (LSTM); IoT; services layer

1. Introduction

The Internet of Things (IoT) is usually a multi-layered network, with layers including the device, controller, services, gateway, and cloud layers. Each layer is built using a different network, requiring the bridging of the networks to transport data to the services layer, where they are processed. The processed results are returned to the cloud for storage and retrieval to perform analytics.

Sensors are deployed in the device layer for data collection and device control. Sometimes, some devices may fail, causing data incompleteness. Generally, redundant sensors are not considered for cost and complex networking reasons. The real challenge, which this research aims to address, is ensuring data completeness when sensors are used without a backup. This critical issue needs to be resolved for the effective functioning of IoT networks.

Figure 1 shows an example IoT network with different layers, including the device, base station, controller, service, and cloud layers. The devices in these layers are linearly connected.

Device failures create faults in the system from the point of missing data, which can cause a certain network section to be non-operative. The missing data must be estimated/predicted and corrected, so the network is highly fault-tolerant.

Many mission-critical applications consider connecting similar devices into clusters, with each cluster headed by a cluster head through which the data are communicated to higher levels of the network. The cluster is generally homogeneous and connects similar sensors, such as the temperature sensor, with common timing to sense and transmit data. Some clusters are fully connected, and some are hierarchical. Most of the clusters have local proximity. Fully connected clusters are to be made hierarchical [1] so that linear networks are formed, making it easy to compute the fault tolerance of the IoT network. Sometimes, devices of different types can be connected to the same cluster with the same proximity. The data of one device can be derived from other devices connected to the same cluster through the development of relationships among those data elements. For example, infrared sensors (IR) could be used to detect the presence of human beings, as with passive infrared sensors (PIRs). Developing a relationship among such sensors will help estimate one parameter distinctly from the other. However, different types of sensors are rarely connected to the same sensor [2].

The service servers render services such as a request to transfer data, a request to receive data, and a request for the transmission of the status of a device. The service servers can also be built with intelligent systems to learn AI models and invoke internal services such as learning a model, estimating the missing data using a model, etc. All the services are like individual programs running in different threads so that their instances are invoked several times [3,4]

The service servers run middleware, which considers each application as a service flow using a flow-based program. Each application is considered a component and stays in different states. Each application is defined with a different threshold, and the combined relationship between the thresholds will help control the state of some other device. A flow diagram connecting the states will show the alternate flow that can be followed in the event of a failure of the devices. In this case, if a device is broken, the output from the devices is estimated based on the output of other related devices [5]. In this approach, a serious limitation is that compatible devices must be used in the neighborhood of the device.

The existing recommendations do not propose any middleware that considers learning a model from the data flowing from a set of sensors, detecting the missing data using the model, and obtaining fully fledged data of a high quality sufficient enough to conduct data analytics and decision-making.

Problem definition

The main problem is establishing complete data when the sensors fail to sense or transmit data.

Research objectives

To build middleware in service servers that sense the existence of missing data and use learning models to predict the missing data and complete the data so that the analytical models can be used more effectively to monitor and control mission-critical systems.
To show how the fault tolerance of the IoT network remains unchanged even in the case of the failure of some devices from a data-availability perspective.

Motivation

The main motivation for this research is that atmospheric studies have reported the absence of some data from sensors, which led to improper atmospheric studies and results.

2. Related Work

When devices become faulty or dead, their expected data may not be read. The missing data must be predicted or estimated to complete the data flowing from the devices. The service server is the ideal location to estimate the missing data due to its high computing power availability.

H. Liu [6] has expressed that Fault-tolerant fault detection systems and fault recovery systems must be built into the IoT system to make an IoT system fault-tolerant. They opined that fault detection could be supported by either self- or cooperative diagnosis. In self-diagnosis, a node can detect whether any of the neighborhood devices have become faulty, and cooperative diagnosis is used when certain events are to be completed based on the cooperation of a set of sensors. However, a recommendation has yet to be made to estimate the data expected from the faulted devices.

R. G. Abhishek et al. [7] recommended spatial correlation among geographically closed sensors. They also recommended using temporal correlation from the same sensor but did not recommend specific mechanisms to estimate the missing data.

O. Boyinbode et al. [8] and F. Kuhn et al. [9] studied various fault tolerance mechanisms relating to wireless sensor networks (WSNs), which include monitoring and recovery mechanisms to address QoS concerns. However, most of the works assume a network has only similar types of sensors. Empirical formulations have yet to be created to deal with the recovery process.

A. Kansal et al. [10] have focused on the multi-modality of sensors. They have also focused on sharing multimedia sensed via sensors. They still need to present how the missing multimedia data can be bridged.

S. Dawson-Haggerty et al. [11] have focused on profiling a device to fit the RESTful paradigm. They still need to explain how services cater to missing data.

N. B. Priyantha et al. [12] have focused on building middleware to support service descriptions and establishing intercommunication between services to address the issue of fault tolerance. They have used models to predict missing data based on the relationships between applications.

X. Wang et al. [13] have dealt with service composition, treating IoT services as web services and replacing each web service with another device, causing much more fault-tolerant issues when devices fail for some reason.

P. Su et al. [14] have addressed the issue of fault tolerance from the perspective of devices and assumed that services are replicated, making a system unwieldy.

V. G. Guimaraes et al. [15] have proposed a framework implemented as a service layer that allows networks to share control information. This service layer is present between any pair of layers and has nothing to do with the services initiated via devices or users on the Internet.

C. Peoples et al. [16] have proposed a service layer that deals with service-level agreements to provide access to the IoT network. This service layer is indirectly related to the network between the controller and service layers, as the response time and throughput depend on the network type. However, this service can also be incorporated into the services layer of the IoT network. This layer has nothing to do with load balancing or estimating missing data.

N. Papulovskaya et al. [17] have proposed a scalable architecture for implementing an IoT network that can be expanded as the demand for its services increases. Scalable architecture is a necessity for the service layer of the IoT network.

W. Yang et al. [18] have explained that IoT networks must be developed using scalable architectures; such services are addressed as users require. Some of the non-functional requirements are to be modeled as services. Implementing a scalable service layer, which is also one of the recommendations of this invention, can address all these requirements.

Melo M et al. [19] have proposed a multi-layer fault tolerance approach, granting interconnection among IoT system layers and allowing information exchange and collaboration to attain the property of dependability. They have defined an event-driven framework called FaTEMa (Fault Tolerance Event Manager) that creates a dedicated fault-related communication channel to propagate events across the levels of the system. The implemented framework assists with error detection and continued service. It also offers extension points to support heterogeneous communication protocols and evolve new capabilities. This model offers no fault detection or computation, and it cannot introduce mechanisms that either retain the earlier fault tolerance level of the IoT network or enhance the FT level.

Alexander et al. [20] focused on the relationships between agricultural growth and environmental data through some hidden states. They observed that data have a nonlinear, sigmoid-type relationship between the agricultural output and the environmental conditions. Here, they predicted the percentage of growth in agricultural output in real terms, considering the variation in the environmental conditions. The model they presented considers missing data with conditions imposed on input and output data through sigmoid functions.

Shafin et al. [21] assumed that the sensor is in working condition and is built with additional intelligence to assess the reliability of the data considering the prevailing noise. It does not consider the issue of reliability of the entire network.

3. Overall Methodology

The overall methodology used to compute the fault tolerance of the IoT network, considering different changes effected in different layers, is shown in Figure 2. Different implementations are carried out in different layers before the packets reach the service layer. In Table 1, improvements made at the data, base station, controller, and gateway layers of the IoT network are present. In all cases, network linearization is performed wherever required in order to convert complex structures to linear models. This paper presents an AI-based method implemented in the service layer to improve the quality of data emanating from sensors.

The metrics required to compute fault tolerance were evolved. An IoT prototype network was implemented, and its fault tolerance was computed using the evolved metrics. Changes were carried out in different layers of the IoT networks, and it was shown how the prototype network’s fault tolerance was improved. A comparative analysis compared the IoT network’s fault tolerance level of the IoT network with the changes made in each layer up to the service layer.

4. Service Layer Architectural Model

In a service layer, more than one service server can be implemented and connected with one controller to process the data channeled through the controller. The service servers are interconnected and connected to the cloud through a gateway using a star topology. Each service server implements the software components required to support different services required for implementation within the IoT network. Figure 3 shows the service component architecture. The controller is connected to the service server, to which the controller and the gateway provide all the requests. The servers support several services, each implemented through a thread. The services generally include receiving data, analyzing data, transmitting data, finding the status, learning models, and predicting missing data. A set of services is modeled into a process model. Several processes are invoked, based on the number of clusters used in the IoT system.

When more service requests are to be processed, several service servers can be included in the service layer of the IoT network. Each processor is installed with the same middleware. Only the traffic is routed to a specific server, which is outside the scope of this article.

The server is loaded with several services threaded to a single program. All the services are run as individual program threads.

Model learning and predicting are implemented in two different threads. The model learning program is active or inactive, based on a toggle-switch variable declared as an external variable. The value of the variable can be changed through a different program. If the toggle-switch value is true, the learning program can run. If not, the program will be dormant; the external user can start or stop the learning program running when required by setting the toggle switch value to false or true.

When the NN model’s learning is set to false through the toggle switch, which is set from an external program, the learning program is made inactive, and the predicted program, which estimates the missing data, is made active.

5. Method for Inline Data Correction

Data are received through the cluster heads. Only some independent processes are invoked in a service server sufficient to deal with different clusters in an IoT network, and each is meant to process the related data. The tasks executed in each clustering process are shown in Figure 4. The data sensed via the devices in a cluster are transmitted through the cluster head, base station, and microcontroller en route to the service server. The processed data are transmitted to the cloud through a gateway or to the controller in order to actuate the devices. The data from the devices are time-framed, meaning that data at a specific time unit are transmitted to the service server for the cluster separately. Each cluster process receives the data related to a specific time unit. These data flow in specific time units.

The cluster-process model is built with a learning model. Learning is performed when the learning status of the model is set to “True”. The data received in the sequence are pre-processed and scaled, and a recurrent neural network is learned to model the sequence data. The learning is sufficiently achieved for a long time as the data are sensed via the sensors and transmitted to the service server. The recurrent model comprises two input models, and the third sensed data are treated as the expected output. The number of examples considered depends on the duration of the time during which learning is undertaken. Enough epochs are to be used to develop a highly accurate model.

When learning is not invoked, the received data are analyzed to find whether there are any missing data. The missing data are predicted using the learned model and transmitted to the cloud en route to the gateway. The received data are also treated similarly. The process is repeated for each time zone, and it continues throughout the life of the IoT system. The IoT system is fault-tolerant when complete and accurate data are transmitted to the cloud, even if some devices may fail during the lifetime of the IoT network.

Microcontrollers group the same cluster data into single packets out of the individual packets of data they receive from the sensors placed in a cluster. The cluster transmits all data associated with a cluster ID, sensor ID, and time stamp, which the microcontroller uses to form the data into cluster data. A null is inserted into the packet if data belonging to the same cluster, sensor ID, and time stamp are unavailable.

The microcontroller transmits the grouped data packet to the service server where the packet is received. The employed data-checking program is an ever-running service in the service layer to check whether the data packets received include all the required data components. When it notices a null value in the data packet, the checking program will communicate to the prediction program, working as a service in the service layer. The packet is reframed by adding the missing data into the packet. When the learning of the NN model is set to false through the toggle switch, which is set via an external program, the learning program is made inactive, and the predicted program is made active; it is the program that deals with estimating the missing data.

6. Prototype IoT Network

A prototype IoT network used to implement middleware within service servers in order to estimate the missing data is shown in Figure 5. Four clusters are used in the device layer to transmit the temperatures, humidity, air conditioning, and air cooling through fans. The four clusters have four cluster heads. The cluster heads are interfaced with the device clusters using a crossbar networking topology to provide many alternate paths for communicating the data with the base station. The network uses two base stations to provide redundancy at the base-station level. Communication between the cluster heads and the base stations occurs in a peer-to-peer mode using a cellular communication protocol. Two base stations are connected to three controllers, and cellular-based peer-to-peer communication is performed among those devices. Three controllers connect three service servers through a crossbar network. The service servers communicate with the cloud through a gateway.

The service server runs middleware to service the requests through the devices or the user on the internet. The load balancing of the service requests from the device to the cloud is performed through load-balancing software running on the controller side. The network shows that heavy redundancy is built wherever required, especially when the traffic is heavy. Figure 5 shows an example IoT network built with nonlinearity in the device, base station, and service layers. The nonlinearity is due to the introduction of a crossbar network in the base station layer and service layer and due to the existence of clusters at the device layer.

Procedure A removes the nonlinearity by replacing the clusters with hierarchical structures in the device layer and the crossbar network with a single device attached. The success rate is the same as that in { 1 - failure rate } computed using probability computational models connected with the cross-network.

Procedure A-Generating linearized IoT network

The step-by-step procedure for converting a nonlinear IoT network to a linear network is shown in Table 2.

Figure 6 shows the linearized IoT network generated using Procedure A, and Figure 7 shows the generated equivalent FTA diagram.

The fault-tolerance diagram generated using Procedure A for the linearized IoT network is shown in Figure 7.

Procedure B, shown in Table 3, is used to generate success-rate computations, considering the generated FTA diagram. The success rate of the root node is the success rate of the entire IoT network. Table 4 is generated, and it shows the success-rate computations. The table reflects the fault-tolerance calculations of different devices contained in the IoT network. In the table, the success rate of the sample IoT network is shown as 0.980.

7. Description of Example Dataset

Data are continuously acquired from different sensors at fixed intervals and written in an XL sheet. One thousand samples are created. The data are sequential and time-bound. They constitute a time series. The output of a sequence is taken as the first input of the next sequence. The absence of the data is indicated through a null character. No data are received when a sensor is completely dead, indicated through a null character. No relationships exist among the data elements. A lookup of three is used due to there being three sensors in a cluster. The data considered are temperature data designed to be in the range of 1 to 100. However, this range can be changed to be within the range of −127 to +127, in which the data are called to fall within a specific range. The data are not classified as a regression model and are used to predict the missing data. An example set of 1000 records is created, with 23 containing missing data. To simulate the breakdown of some sensors, some sensors are randomly switched off through toggle switches. Three temperature sensors at three different locations at the same proximity are used to collect the data. The data are received in sequence, one after the other, at fixed time intervals. A sample of data collected through the sensors as a sequence of data at different periods is shown in Table 5, which concerns the cluster responsible for dealing with temperature sensing. The complete data collected contain 1000 examples, out of which 670 examples are used for training, and the remaining 330 examples are used for testing. The data are static, as no relationship exists among the data elements. All data elements are expected to appear in every packet transmitted through the controllers, and some temperature data are missing at random (MAR).

8. The Issue of Missing Data and Channelling the Data in Sequences

The failure of any device in the device layer leads to an issue of missing data. Some devices might go out of operation due to breakdowns/failures through the logic implemented, which isolates the devices when fault injection is noticed. The incomplete data are not of any use when stored in the cloud. The more data records are missing, the more unreliable the IoT system is. There is a need to introduce methods that predict the missing data.

The microcontrollers receive the data from the devices through the base station, which is channeled to be in sequence, with the missing data signified with null values. The order of the data sequence is always maintained on the controller side. Cluster-specific data are always channelled to a specific server, where the missing data are predicted, the data are completed, and the reliability thus increases.

Microcontrollers group the same cluster data into single packets out of the individual packets of data they receive from the sensors placed in a cluster. The cluster transmits all data attached with a cluster ID, sensor ID, and time stamp, which the microcontroller uses to form the data into cluster data. A null is inserted into the packet if any data belonging to the same cluster, sensor ID, and time stamp are unavailable.

The microcontroller transmits the grouped data packet to the service server where the packet is received. The data-checking program employed is an ever-running service in the service layer to check whether the data packets received include all the required data components. When it notices a null value in the data packet, the checking program will communicate to the prediction program working as a service in the service layer. The packet is reframed by adding the missing data to the packet.

9. Model Learning

The prediction of missing temperatures is achieved by building the MLP (multi-layer perceptron) model and the long short-term memory (LSTM) recurrent neural network (RNN) with different variants. A lookback of three temperatures and a batch size of two examples are used for all models. The recurrent neural network (LSTM) is used in four variants to predict the missing data. The variants include lookback = 3, lookback = 3 + time stamps = 3, lookback = 3 + time stamps = 3 + maintaining network states and lookback = 3 + time stamps = 3 + maintaining network states + maintaining memory between the batches.

The model parameters used in building the MLP and RNN models are placed in Table 6. The parameters are used to train and test the models for accuracy. Different variants of LSTM models are used to find the best model that fits the temperature data well. The models, once learned, are used in estimating any missing data. The model learning and the usage of the models to estimate the missing data are installed and made operational as independent services installed within the servers of each service. These services are co-existent with other service servers.

10. Results and Discussion

The above models are programmed using Python Notebook with KERAS and TensorFlow loaded, which runs on the Windows-11 operating system. The models are run on a computer system running 12th-generation eight-core CPUs and two Nvidia GPUs.

Learning and testing the models is achieved by setting several epochs at 100 until 1000, with each increment fixed at 100 epochs. Experimentation is conducted to find the best model to be used so that the missing temperature is perfectly predicted with high accuracy and the shortest response time. The results are shown in Table 7. The table shows the prediction accuracy of the following sequence-processing models:

MLP–regression
LSTM–regression
LSTM–regression with equal time stamps as lookbacks
LSTM–Regression with equal time stamps as lookbacks and network states preserved
LSTM–regression with equal time stamps as lookbacks and network states preserved, and memory remembered between the batches.

All models are learned, keeping the lookbacks = three and the batch size = 2. The accuracy is measured regarding the root mean square error (RMSE). It is seen in the table that the lowest RMSE (0.34) is achieved when the number of epochs = 300 and batch size = 2, within a response time of 671 microseconds using LSTM variant (regression—3 look-up—3 time stamps and maintaining the network states after each epoch).

Others have estimated the missing data through other techniques, which include the least square method (LSM) [24], the multivariate adaptive regression method [25], and the revolutionary method–dominated Mult objective-based genetic algorithm [26], which are dependent either on the availability of the backup sensors or the availability of the sensors that support either direct or indirect multi measurements. None have considered the sequence and timing of data generation, which is important, as the data are sensed and transmitted continuously. The prediction of missing data is requested, considering the sequence of data generation. A comparison of the methods used is presented in Table 8.

Using backup sensors is very expensive, and operating such a system is complicated. Sometimes, it is not feasible to use multi-model relationships, as the sensors used in an IoT network greatly vary. The sensor data flow is continuous, and it occurs with a specific timing and sequence, which is the most important factor to consider.

The LSTM-RMSE method proposed in this paper is 99.66% accurate and requires only 671 microseconds of time to learn, which is the lowest compared to the other method. An analysis of the fault-tolerance behavior of the sample IoT network due to missing data is shown in Table 9. The fault tolerance of the IoT network gradually decreases due to missing data, even when a high level of redundancy is implemented in the network. The IoT network’s fault tolerance could be sustained by implementing the LSTM-RMSE method in the service server. It can be seen from the table that the fault tolerance of the IoT network could be retained to the extent of 0.432.

11. Conclusions

IoT networks are complex as they involve many layers. Each layer is designed and implemented considering the kind of devices in that layer. Failures happen in each layer, and the failures propagate from one layer to another, affecting the entire IoT network and reducing its fault-tolerance capacity.

Only complete data are accurate. Efforts to complete data before they are transmitted for storage in the cloud will, at least to a larger extent, sustain the fault tolerance of the IoT network. The fault-tolerance capacity will be reduced when the sensors fail to sense and transmit data that are ultimately stored in the cloud.

The data flow from the sensors is sequential, continuous, and on time. Some data at different time intervals will be missing due to the failure of the sensors. The data must be completed before the transmission of the data to the cloud. LSTM models are most suitable for considering sequential and time data with different data features. The LSTM model with lookup and time stamps = 3 and with state rest after every epoch yields a high level of accuracy (99.66%) compared to any other model. The decrease in the fault tolerance of the IoT network due to incomplete data is more than compensated for by implementing a deep learning-based LSTM neural network.

Author Contributions

Conceptualization, S.K.R.J. and S.B.J.; methodology, S.K.R.J.; software, B.K.D.; validation, B.K.D.; formal analysis, S.K.R.J.; investigation, S.K.R.J. and B.C.; resources, S.B.J.; data curation, B.K.D.; writing—original draft preparation, S.K.R.J. and S.B.J.; writing—review and editing, S.K.R.J. and B.C.; visualization, B.C.; supervision, S.K.R.J.; project administration, S.K.R.J.; funding acquisition, S.K.R.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chokara, B.; Jammalamadaka, S.K.R. Hybrid models for computing fault tolerance of IoT networks. Telkomnika Telecommun. Comput. Electron. Control 2023, 21, 333–345. [Google Scholar]
Zhou, S.; Lin, K.J.; Na, J.; Chuang, C.C.; Shih, C.S. Supporting Service Adaptation in Fault Tolerant Internet of Things. In Proceedings of the 2015 IEEE 8th International Conference on Service-Oriented Computing and Applications, Rome, Italy, 19–21 October 2015; pp. 66–72. [Google Scholar]
Sastry, J.K.R.; Sowmya, K. Implementing load-balanced concurrent service layer for improving the response time of an IoT network. J. Eng. Sci. Technol. 2022, 17, 4487–4504. [Google Scholar]
Anjana, A.; Chand, G.G.; Kiran, K.S.; Bhpathi, J.S. On improving fault tolerance of IoT networks through Butterfly Networks implemented at the Services Layer. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 2096–2115. [Google Scholar] [CrossRef]
Reijers, N.; Lin, K.-J.; Wang, Y.-C.; Shih, C.-S.; Hsu, J.Y. Design of an intelligent middleware for flexible sensor configuration in M2M systems. Sensornets 2013, 13, 41–46. [Google Scholar]
Liu, H.; Nayak, A.; Stojmenovi, I. Fault-tolerant algorithms/protocols in wireless sensor networks. In Guide to Wireless Sensor Networks; Springer: Berlin/Heidelberg, Germany, 2009; pp. 261–291. [Google Scholar]
Sharma, R.G.A.B.; Golubchik, L. Sensor faults: Detection methods and prevalence in real-world datasets. ACM Trans. Sens. Netw. 2010, 6, 1864–1869. [Google Scholar] [CrossRef]
Boyinbode, O.; Le, H.; Mbogho, A.; Takizawa, M.; Poliah, R. A survey on clustering algorithms for wireless sensor networks. In Proceedings of the 2013 16th International Conference on Network-Based Information Systems, Gwangju, Republic of Korea, 4–6 September 2010; pp. 358–364. [Google Scholar]
Kuhn, F.; Moscibroda, T.; Wattenhofer, R. Fault-tolerant clustering in ad hoc and sensor networks. In Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems, Lisboa, Portugal, 4–7 July 2006; p. 68. [Google Scholar]
Kansal, A.; Nath, S.; Liu, J.; Zhao, F. Senseweb: An infrastructure for shared sensing. IEEE Multimed. 2007, 14, 8–13. [Google Scholar] [CrossRef]
Dawson-Haggerty, S.; Jiang, X.; Tolle, G.; Ortiz, J.; Culler, D. Smap: A simple measurement and actuation profile for physical information. In Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, Zurich, Switzerland, 3–5 November 2010. [Google Scholar]
Priyantha, N.B.; Kansal, A.; Goraczko, M.; Zhao, F. Tiny web services: Design and implementation of interoperable and evolvable sensor networks. In Proceedings of the 6th ACM Conference on Embedded Network Sensor Systems, Raleigh, NC, USA, 6–9 November 2008; ACM: New York, NY, USA, 2008; pp. 253–266. [Google Scholar]
Wang, X.; Wang, J.; Zheng, Z.; Xu, Y.; Yang, M. Service composition in service-oriented wireless sensor networks with persistent queries. In Proceedings of the 2009 6th IEEE Consumer Communications and Networking Conference, Las Vegas, NV, USA, 10–13 January 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–5. [Google Scholar]
Su, P.H.; Shih, C.S.; Hsu, J.Y.J.; Lin, K.J.; Wang, Y.C. Decentralized fault tolerance mechanism for intelligent IoT/m2m middleware. In Proceedings of the 2014 IEEE World Forum on Internet of Things (WF-IoT), Seoul, Republic of Korea, 6–8 March 2014; pp. 45–50. [Google Scholar]
Guimaraes, V.G.; de Moraes, R.M.; Obraczka, K.; Bauchspiess, A. A Novel IoT Protocol Architecture: Efficiency through Data and Functionality Sharing across Layers. In Proceedings of the 2019 28th International Conference on Computer Communication and Networks (ICCCN), Valencia, Spain, 29 July–1 August 2019; pp. 1–9. [Google Scholar] [CrossRef]
Peoples, C.; Abu-Tair, M.; Wang, B.; Rabbani, K.; Morrow, P.; Rafferty, J.; Moore, A.; McClean, S. Building Stakeholder Trust in Internet of Things (IoT) Data Services using Information Service Level Agreements (SLAs). In Proceedings of the 2019 IEEE 5th World Forum on Internet of Things (WF-IoT), Limerick, Ireland, 15–18 April 2019; pp. 454–459. [Google Scholar] [CrossRef]
Papulovskaya, N.; Izotov, I.; Orekhov, P. Implementing IoT Systems in Service-Oriented Architecture. In Proceedings of the 2019 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), Yekaterinburg, Russia, 25–26 April 2019; pp. 264–267. [Google Scholar] [CrossRef]
Yang, W.; Deng, F. A Service Selection Method Based on QoS in IoT. In Proceedings of the 2020 5th International Conference on Computer and Communication Systems (ICCCS), Shanghai, China, 15–18 May 2020; pp. 791–795. [Google Scholar] [CrossRef]
Melo, M.; Aquino, G. FaTEMa: A Framework for Multi-Layer Fault Tolerance in IoT Systems. Sensors 2021, 21, 7181. [Google Scholar] [CrossRef] [PubMed]
Kocian, A.; Carmassi, G.; Cela, F.; Incrocci, L.; Milazzo, P.; Chessa, S. Bayesian Sigmoid-Type Time Series Forecasting with Missing Data for Greenhouse Crops. Sensors 2020, 20, 3246. [Google Scholar] [CrossRef]
Shafin, S.S.; Karmakar, G.; Mareels, I.; Balasubramanian, V.; Kolluri, R.R. Sensor Self-Declaration of Numeric Data Reliability in the Internet of Things. IEEE Trans. Reliab. 2024, C1, 1–15. [Google Scholar] [CrossRef]
Sastry, J.K.; Ch, B.; Budaraju, R.R. Implementing Dual Base Stations within an IoT Network for Sustaining the Fault Tolerance of an IoT Network through an Efficient Path Finding Algorithm. Sensors 2023, 23, 4032. [Google Scholar] [CrossRef]
Jammalamadaka, S.K.R.; Chokara, B.; Jammalamadaka, S.B.; Duvvuri, B.K.; Budaraju, R. Enhancing the Fault Tolerance of a Multi-Layered IoT Network through Rectangular and Interstitial Mesh in the Gateway Layer. J. Sens. Actuator Netw. 2023, 12, 76. [Google Scholar] [CrossRef]
Simon, D. Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Friedman, J.H. Multivariate adaptive regression splines. Ann. Stat. 1991, 19, 1–67. Available online: http://www.jstor.org/stable/2241837 (accessed on 15 October 2024). [CrossRef]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: Nsga-ii. Evol. Comput. IEEE Trans. 2002, 6, 182–197. [Google Scholar] [CrossRef]

Figure 1. An example IoT network illustrating different layers.

Figure 2. Overall methodology for effecting improvements in fault tolerance of the IoT network up to the service layer.

Figure 3. Service architecture of an IoT network.

Figure 4. The method handles missing data in the service layer.

Figure 5. Prototype non-linear network.

Figure 6. Linearized IoT network.

Figure 7. FTA diagram for sample IoT network.

Table 1. Inventions implemented in the device, base station, and controller layers.

Layer	Inventions Implemented
Device layer	A crossbar network was implemented to make available several alternate paths for communication.
Device layer	A method to predict the occurrence of power faults and isolate the devices that might inject a power fault into systems is being submitted to MDPI’s Sensors journal
Base station layer	Implementing dual base stations within an IoT network to sustain the fault tolerance of an IoT
Base station layer	Network through an efficient path-finding algorithm [22]
Controller layer	Networking microcontrollers to address the failure of controllers
	To implement a load-balancing system so that the load on the servers is equally managed
	Performing the sequencing of data emanating from a cluster and preparing a group data packet, which has been submitted to the MDPI’s journal Mathematics
Service-layer	Improving the quality of data (this paper) in the presence of device failures
Gateway layer	Enhancing the fault tolerance of a multi-layered IoT network through rectangular and interstitial
Gateway layer	mesh in the gateway layer [23]

Table 2. Procedure A—generating a linear IoT network from an example IoT network.

Step Number	Process Undertaken
1	Capture an IoT network’s hierarchy of hardware elements and update a database.
2	Capture the clusters existing in the IoT diagram, convert it to a hierarchical model, and update the items in the database.
3	Update the database with the fault rate of the devices obtained from the manufacturers.
4	For each network topology, compute the success rate, and include a device in its place associated with the calculated success rate.
4	Capture the relationship (or, and) between each device and its predecessors, and update the database.
5	Generate the linear tree into a graph model.

Table 3. Procedure B—computing the fault rate of a linear IoT network.

Step Number	Process Undertaken
1	Query the elements from the database in the hierarchical order of preceding relationships connected from the child nodes.
2	Using and–or rules, compute the outgoing device’s fault rate.
3	If the relationship between the devices is an and relationship, an outgoing device’s fault rate is multiplied by the incoming device’s fault rate.
4	If the relationship between the devices is an or relationship, the outgoing device’s fault rate is the lowest of the incoming devices’ fault rates.
5	Calculate the fault rate of the root device. A root device has no parents.
6	Generate a fault-computation table.

Table 4. Fault-tolerance computation of the sample IoT network.

S No.	Device	Success Rate	Gates Used for Connection	Preceding Devices				Combined Success Rate
				Device 1	Device 2	Device 3	Device 4
				Success Rate 1	Success Rate 2	Success Rate 3	Success Rate 4
1	Cluster Head 1	0.950						0.950
2	Cluster Head 2	0.950						0.950
3	Cluster Head 3	0.950						0.950
4	Cluster Head 4	0.950						0.950
5	D1	0.950	Or	Cluster Head 1 0.950				0.950
6	D2	0.950	Or	Cluster Head 2 0.950				0.950
7	D3	0.950	Or	Cluster Head 3 0.950				0.950
8	D4	0.950	Or	Cluster Head 4 0.950				0.950
9	Device level CrossBar NW	0.987	Or	D1 0.950				0.987
10	Device level CrossBar NW	0.987	Or	D2 0.950				0.987
11	Device level CrossBar NW	0.987	Or	D3 0.950				0.987
12	Device level CrossBar NW	0.987	Or	D4 0.950				0.987
13	D5	0.950	Or	DLCB 0.987				0.987
14	D6	0.950	Or	DLCB 0.987				0.987
15	D7	0.950	Or	DLCB 0.987				0.987
16	D8	0.950	Or	DLCB 0.987				0.987
17	Base Station 1	0.950	Or	D5 0.987	D6 0.987	D7 0.987	D8 0.987	0.987
18	RL1	0.950	Or	CH1 0.950	CH2 0.950			0.950
19	RL2	0.950	Or	CH2 0.950	CH3 0.950			0.950
20	RL3	0.950	Or	CH3 0.950	CH4 0.950			0.950
21	RL4	0.950	Or	RL1 0.950	RL2 0.950			0.950
22	RL5	0.950	Or	RL1 0.950	RL2 0.950			0.950
23	Base Station 2	0.950	Or	RL4 0.950	RL5 0.950			0.950
24	Controller 1	0.979	Or	BS1 0.987	BS1 0.950			0.987
25	Controller 2	0.979	Or	BS1 0.987	BS1 0.950			0.987
26	Controller 3	0.979	Or	BS1 0.987	BS1 0.950			0.987
27	Controller Level CrossBar NW	0.970	CROSSBAR NW	Controller 1 0.987	Controller 2 0.987	Controller 3 0.987		0.987
28	Server 1	0.980	And	CLCB 0.987				0.967
29	Server 2	0.980	And	CLCB 0.987				0.967
30	Server 3	0.980	And	CLCB 0.987				0.967
31	Gateway	0.980	Or	Server 1 0.967	Server 2 0.967	Server 3 0.967		0.980
32	INTERNET	0.980	And	Gateway 0.980				0.960

Table 5. Collection of temperature data in sequence at different periods.

Period	Parameter Sensed	Temperature Measured
t1	Temp-1	78
t2	Temp-2	79
t3	Temp-3	80
t4	Temp-1	78
t5	Temp-2	78
t6	Temp-3	78
t7	Temp-1	80
t8	Temp-2	79
t9	Temp-3	78

Table 6. Model parameters used for different deep learning models.

Type Model	Type of Method	Type of Layer	Number of Inputs	Number of Outputs	Type of Activation
MLP	Regression with Lookup = 3	Dense	3	8	RELU
		Dense	8	1	-
		Model Parameters
		Loss function	Mean squared error	Optimizer	Adams
LSTM	Regression with Lookup = 3	LSTM	3	4	-
		Dense	4	1	-
		Model Parameters
		Loss Function	Mean squared error	Optimizer	Adam
LSTM	Regression with Lookup = 3	LSTM	3	4	-
		Dense	4	1	-
		Model Parameters
		Loss function	Mean squared error	Optimizer	Adam
LSTM	Regression	LSTM	3	4	-
	with
	Lookup = 3
	With time stamps = 3	Dense	4	1	-
	and Maintaining	Model Parameters
	Network shape	Loss function	Mean squared error	Optimizer	Adam
LSTM	Regression	LSTM	3	4	-
	with	LSTM	3	4	-
	Lookup = 3	LSTM	4	4
	With time stamps = 3	Dense	4	1	-
	and Maintaining	Model Parameters
	Network shape with memory between the states	Loss function	Mean squared error	Optimiser	Adam

Table 7. Estimating the prediction accuracy of different sequence-processing models.

Model- Classifier		MLP- Regression—3 Lookups		LSTM- Regression—3 Lookups		LSTM- Regression-3 Lookups and 3 Time Steps		LSTM- Regression-3 Lookups and 3 Time Steps and Maintaining Network States After Every Epoch		LSTM- Regression-3 Lookups and 3 Time Steps and Maintaining Network States After Every Epoch and Memory Between the Batches
Epochs	Batch Size	RMSE	Average Response Time per step in (uS)	RMSE	Average Response Time per step in (uS)	RMSE	Average Response Time per step in (uS)	RMSE	Average Response Time per step in (uS)	RMSE	Average Response Time per step in (uS)
100	2	1.15	442	0.95	2000	0.92	1000	0.52	710	19.02	1100
200	2	1.09	675	0.89	2000	0.88	736	0.69	717	17.85	1000
300	2	1.36	1000	0.85	829	0.85	1000	0.34	671	19.08	1000
400	2	1.15	895	0.82	2000	0.82	1000	0.46	709	17.36	990
500	2	1.51	803	0.72	1000	0.74	1000	0.57	727	19.34	1000
600	2	1.08	642	0.76	1000	0.75	1000	0.41	741	17.37	1000
700	2	1.13	822	0.69	999	0.78	1000	0.35	884	19.26	1000
800	2	1.08	762	0.63	824	0.61	1000	0.41	752	16.97	809
900	2	1.07	711	0.64	867	0.61	1000	0.35	781	18.96	1000
1000	2	1.09	486	0.65	869	0.6	889	0.85	725	16.85	1000

Table 8. Comparative analysis of methods related to estimating missing data.

Serial Number	Parameter Used	LSM [22]	MVARM [23]	RM [24]	LSTM-RMSE
1	Use of data sequences	N	N	N	Y
2	Use of timing of data	N	N	N	Y
3	Use of backup sensors	Y	Y	N	N
4	Spread of solutions	N	N	Y	Y
5	Use of multi-model relationships	Y	Y	Y	N
6	Accuracy in percentage	90.68	88.42	90.02	99.66
7	Response time in microseconds	1120	1700	1800	671

Table 9. Fault-tolerance calculations concerning missing data.

S No.	Number of Packets Transmitted	Number of Packets with Completed Data	% of Complete Packets Received	Success Rate of the IoT Network	Success Rate due Decrease in Complete Data Packets	Decrease in Fault Tolerance	Completed Packets Received Due to Implement Ation of LSTM-RMSE	Success Rate After the Impleme Ntation of Middle Ware	Improvement in Fault Tolerance Rate Due to LSTM-RMSE
1	100	90	90	0.98	0.882	0.098	99.66	0.977	0.095
2	150	92	61.33	0.98	0.601	0.379	99.66	0.977	0.376
3	160	89	55.63	0.98	0.545	0.435	99.66	0.977	0.432
4	200	160	80	0.98	0.784	0.196	99.66	0.977	0.193
5	300	243	81	0.98	0.794	0.186	99.66	0.977	0.183
6	400	321	80.25	0.98	0.786	0.194	99.66	0.977	0.19
7	500	432	86.4	0.98	0.847	0.133	99.66	0.977	0.13
8	600	444	74	0.98	0.725	0.255	99.66	0.977	0.251
9	700	560	80	0.98	0.784	0.196	99.66	0.977	0.193
10	800	590	73.75	0.98	0.723	0.257	99.66	0.977	0.254

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jammalamadaka, S.K.R.; Chokara, B.; Jammalamadaka, S.B.; Duvvuri, B.K. Using DL Models in the Service Layer to Enhance the Fault Tolerance of IoT Networks. Electronics 2024, 13, 4334. https://doi.org/10.3390/electronics13224334

AMA Style

Jammalamadaka SKR, Chokara B, Jammalamadaka SB, Duvvuri BK. Using DL Models in the Service Layer to Enhance the Fault Tolerance of IoT Networks. Electronics. 2024; 13(22):4334. https://doi.org/10.3390/electronics13224334

Chicago/Turabian Style

Jammalamadaka, Sastry Kodanda Rama, Bhupati Chokara, Sasi Bhanu Jammalamadaka, and Balakrishna Kamesh Duvvuri. 2024. "Using DL Models in the Service Layer to Enhance the Fault Tolerance of IoT Networks" Electronics 13, no. 22: 4334. https://doi.org/10.3390/electronics13224334

APA Style

Jammalamadaka, S. K. R., Chokara, B., Jammalamadaka, S. B., & Duvvuri, B. K. (2024). Using DL Models in the Service Layer to Enhance the Fault Tolerance of IoT Networks. Electronics, 13(22), 4334. https://doi.org/10.3390/electronics13224334

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu