[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3694875.3694883acmotherconferencesArticle/Chapter ViewFull TextPublication PagesicgspConference Proceedingsconference-collections
research-article
Open access

Real-Time Anomaly Detection with LSTM-Autoencoder Network on Microcontrollers for Industrial Applications

Published: 12 December 2024 Publication History

Abstract

In the fast-paced landscape of modern industry, traditional safety systems struggle to identify and mitigate complex, persistent threats that can lead to operational disruptions and safety hazard. This study uses an anomaly detection system for the BDB 825 diamond dry drilling machine, leveraging a 6-axis accelerometer sensor for comprehensive operational monitoring. The system utilizing Long Short-Term Memory (LSTM) networks and autoencoders, optimized for Arduino microcontrollers using quantization techniques and integrated with TensorFlow Lite for real-time implementation. The effectiveness of this system is demonstrated in its ability to accurately detect anomalies, enhancing operational safety and reducing the risk of disruptions in industrial settings.

1 Introduction

The industrial sector is currently undergoing a transformative phase marked by the integration of advanced machine learning techniques, significantly bolstering operational safety and efficiency. This paper presents a comprehensive study into the formulation and execution of an anomaly detection framework, based on the synergy of Long Short-Term Memory (LSTM) networks and autoencoders, optimized via model quantization for deployment in the BDB 825 diamond dry drilling apparatus. This framework utilizes Arduino microcontrollers, interfaced with a 6-axis accelerometer sensor, for robust operational monitoring.
At the core of our approach is the integration of a 6-axis accelerometer sensor within the drilling machine, essential for monitoring operational metrics such as acceleration and gyroscopic dynamics. The system's objective is the swift identification of operational anomalies, indicative of potential mechanical failures or safety risks, manifesting as deviations from established operational norms, including equipment tilting or jamming. The experiment begins with the collection and preprocessing of sensor data. Employing LSTM networks and autoencoders, we identify patterns and anomalies within the data. A significant portion of our effort is focused on deploying the trained model onto Arduino microcontrollers, a complex task due to computational limitations. We navigate these constraints through advanced model quantization techniques, optimizing the machine learning model for real-time operational efficacy. The deployment process integrates the TensorFlow Lite micro interpreter within the Arduino ecosystem, ensuring optimized model performance. This configuration achieves a response latency of less than 200 milliseconds from anomaly detection to alert generation, meeting stringent real-time performance benchmarks for industrial safety.
This paper contributes to the growing body of knowledge in the field of industrial safety and automation. By demonstrating the feasibility and effectiveness of deploying machine learning models on microcontrollers for real-time anomaly detection, this research serves as an example for similar applications in other sectors.

2 Literature Survey

In the context of industrial safety, the implementation of anomaly detection using machine learning and deep learning techniques plays an important role. The safety of industrial application is paramount, as they are integral to the operation of critical infrastructures. Anomalies in these systems can lead to significant safety hazards, and operational disruptions.
A The integration of machine learning in industrial anomaly detection is crucial for enhancing efficiency and safety. The use of supervised learning, in particular, has shown promising results in this domain. Chuadhry Mujeeb Ahmed, Gauthama Raman M R, and Aditya P. Mathur, in their study "Challenges in Machine Learning based approaches for Real-Time Anomaly Detection in Industrial Control Systems," emphasize the necessity of high detection rates and low false alarms in industrial environments. This approach is vital for real-time monitoring in large-scale industrial settings like power grids [1]. Yu Jiang, Wei Wang, and Chunhui Zhao explore anomaly detection in industrial products by proposing a model based on YOLOv3 for balanced datasets and a semi-supervised model for unbalanced datasets, highlighting the importance of adaptive models for various data types in industrial settings [2]. Van Quan Nguyen and colleagues investigate LSTM-based anomaly detection in time series data, demonstrating LSTM's effectiveness in learning sequence data [3].
Maged Abdelaty et al. present "DAICS," a deep learning framework for anomaly detection in ICSs. Their 2-branch neural network model showcases adaptability to ICS behavioral changes with minimal human intervention, suited for dynamic industrial environments [4]. Sohrab Mokhtari and his team propose a measurement intrusion detection system (MIDS) for anomaly detection in ICS based on measurement data [5]. This method exemplifies the potential of machine learning models in detecting anomalies beyond typical network-level intrusions. Kyung Sung Lee, Seong Beom Kim, and Hee-Woong Kim develop a hybrid LSTM- autoencoder model. Their model adeptly handles multivariate time series data, a key aspect of manufacturing processes [6]. Narjes Davari, Bruno Veloso, and Rita P. Ribeiro's work on predictive maintenance in the railway industry's air production unit utilizes a sparse autoencoder (SAE) network for anomaly detection. This approach is particularly effective in identifying different types of failures in specialized industrial machinery, using both analog and digital sensor data. Their methodology underscores the nuanced understanding required for anomaly detection, enhancing safety and maintenance efficiency in the railway industry [7]. Zhe Li and colleagues' combines stacked autoencoders (SAE) with LSTM networks for anomaly detection in mechanical equipment. Their approach, focusing on unlabeled data in mechanical systems, highlights the effectiveness of deep learning architectures in identifying anomalies in complex industrial settings [8]. Bayram et al. introduce advanced encoding techniques for preprocessing multivariate time series data, enhancing anomaly detection in industrial applications by leveraging Convolutional Autoencoders (CAE) for improved data representation and sensitivity [11]. Md Nur Amin et al. make use of GAN with Gradient Penalty for data augmentation to handle the limited amount of data for class imbalance [12].

3 Data Collection

The data collection process for the anomaly detection system in the BDB 825 diamond dry drilling machine was designed to ensure comprehensive and accurate data acquisition which is important for training the algorithm to detect operational anomalies effectively. To capture a wide range of operational scenarios, data was collected under various conditions. Given the critical nature of the sensor's placement for optimal data acquisition, the microchip with the 6-axis sensor was strategically positioned as far from the drill shaft's center as possible as shown in fig. 1. This placement was chosen to maximize leverage and enhance the sensor's ability to detect subtle changes in acceleration and vibration. The Sensor Shield for Arduino board [10] from Würth Elektronik, equipped with a 6-axis IMU sensor WSEN-EVAL ISDS [9], was used to measure Gyroscope and Accelerometer data on X,Y,Z axis. The training dataset consisted of only 350 data points, while the test dataset comprised 290 data points because of the arduous process of operating the drilling machine.
Figure 1:
A person using a tool to drill a wall Description automatically generated
Figure 1: Arduino board set up and drilling machine operation on site

3.1 Execution of Data Collection

The data collection was executed in a controlled environment to simulate various drilling conditions. The process involved:
1.
Normal Operation Data Collection: The machine was operated under standard conditions without any obstruction, ensuring the collection of baseline data for normal operation across all axes.
2.
Soft Impact and Radial Drilling: Data was also gathered during soft impact drilling and radial drilling operations, both with and without soft impact. This was crucial to understand the typical vibration patterns under these specific conditions.
3.
Tilting and Jamming Scenarios: To simulate blockages or malfunctions, the drilling machine was deliberately tilted or jammed during operation. These scenarios were critical for capturing data that represented abnormal or anomalous conditions.
4.
Real-Time Data Transmission and Recording: The accelerometer's data was transmitted in real- time to the Arduino board via the SPI interface. The Arduino board, in turn, relayed this data to a connected laptop, where it was recorded and stored for further analysis. The continuous sensor signals are sampled at a fixed rate (4 data points per second) to create a discrete time series.

3.2 Real-Time Anomaly Logging with Delay Compensation

After detection of an anomaly event, the machine operator initiates a logging protocol by activating a designated switch interfaced with the Arduino microcontroller, thereby marking the occurrence with a synchronized timestamp. However, a latency between the actual occurrence of the anomaly and its corresponding log entry registration exists due to human and system response times. To accurately pinpoint the anomaly within the data stream, a post-processing step is employed to identify the most significant data deviation preceding the logged timestamp, assumed to represent the anomaly's characteristics. This process involves implementing a temporal analysis algorithm that scrutinizes a predefined time window preceding the registered anomaly event. The objective is to locate the maximum data peak or spike, indicative of the anomaly's manifestation within this window. The algorithm mathematically identifies this peak by searching for the maximum value across the sensor data points within the specified window, effectively compensating for the delay in manual event registration. Let's denote:
1.
\(X = \{ {{x}_1,{x}_2, \ldots ,{x}_n} \}\) as the time series data, where \(xi\) represents the value at time \(i\).
2.
tanomaly as the timestamp of the anomaly, which corresponds to the time index of the highest peak in the data.
To find the anomaly, we can use a sliding window approach to search for the highest peak within the most recent data. Here's a step-by-step explanation:
1.
Define a Sliding Window: Choose a window size ( ) that determines the recent period of data to consider.
2.
Find the Maximum Value: Within the sliding window of size (\(w\)), identify the maximum value (max(Xrecent)), where (Xrecent) represents the subset of data within the sliding window.
3.
Identify the Anomaly Timestamp: The timestamp of the anomaly, (\({t}_{{\rm{anomaly}}}\)), corresponds to the index of the maximum value within the sliding window.
Mathematically, we can express this process as follows:
\begin{equation*}{t}_{{\rm{anomaly}}} = \arg \mathop {\max }\limits_{i \in \left[ {n - w + 1,n} \right]} \{ {x}_i\} \end{equation*}
Here, \([ {n - w + 1,n} ]\) represents the range of indices corresponding to the most recent ( ) data points.

4 Methodology

The methodology section describes all the steps from preprocessing of the data to the deployment of the model to Arduino.

4.1 Preprocessing

To ensure that each feature contributes proportionally to the distance calculations in the model, we normalize the data. Let \({x}_i\) represent the \({i}^{{\rm{th}}}\) feature in the dataset. The normalization is given by:
\begin{equation*}x_i^{'} = \frac{{{x}_i - {{\rm{\mu }}}_i}}{{{{\rm{\sigma }}}_i}}\end{equation*}
where \({\mu }_i\) and \({\sigma }_i\) are the mean and standard deviation of \({x}_i\), respectively, computed over the training dataset.
Sliding window of 10 is created to capture local patterns and temporal dependencies. By moving the window along the time series, we can analyze subsets of sequential data points, allowing us to identify anomalies within smaller, localized contexts. Given a time series data \(X = \{ {{x}_1,{x}_2, \ldots ,{x}_n} \}\), where \({x}_i\) represents the value at time \(i\), and a sliding window size \(w\) = 10:
1.
Define the sliding window size as \(w\) = 10.
2.
For each starting index from 1 to \(n - w + 1\), create a window \(Wi\) containing 10 consecutive data points starting from index \(i\).
3.
Analyze each window \(Wi\) to detect anomalies or patterns within the local context of those 10 data points.

4.2 Synthetic Data Generation

The Because of small data size, 30000 synthetic samples are generated from 290 training samples. In the synthetic data generation phase, a Generative Adversarial Network (GAN) was deployed to augment the limited dataset from a 6-axis accelerometer sensor. The GAN framework consists of two neural networks in opposition: a generator and a discriminator. The generator is tasked with producing synthetic sequential data resembling the real dataset's statistical properties, using a latent space for input variability. It uses Long Short-Term Memory (LSTM) layers to capture temporal dependencies, essential for maintaining the time-series nature of the data. The discriminator aims to differentiate between authentic and generated data, improving its identification accuracy through adversarial training. This process iteratively adjusts the generator and discriminator through a loss function balancing both networks' learning progression. The result is a substantial increase in the dataset's size, enhancing the LSTM-Autoencoder model's training for anomaly detection by exposing it to a broader range of data patterns and operational scenarios. The process of the data generation is as follows:

4.3 Model Architecture

The model first aims to reconstruct the input sequence accurately, then classifies each time step for potential anomalies across all sensors, making it robust in anomaly detection tasks due to its ability to capture both spatial and temporal features effectively.

4.3.1 Convolutional Autoencoder (CAE).

The CAE effectively extracts spatial features from each time step of the sequence. The primary function of the CAE in our model is to learn a compressed representation of your input data and then reconstruct the input data from this compressed form. The CAE is designed with an encoder-decoder structure. The encoder is composed of Conv1D layers, each followed by LeakyReLU activation for non-linearity and BatchNormalization for stabilizing the learning process. MaxPooling1D is used for down sampling, and Dropout is incorporated for regularization. This part of the network learns to compress the input data into a lower- dimensional representation, capturing essential spatial features. The decoder Uses Conv1D layers for upscaling the encoded representations back to the original data dimensionality. The final output of the decoder aims to reconstruct the original input sequence.

4.3.2 Integration of BiLSTM for Anomaly Classification.

The BiLSTM captures temporal dependencies and dynamics in the data, a key aspect in anomaly detection where the context of data points is vital. After the CAE, the encoded representations are flattened and fed into a Bidirectional LSTM layer. This structure allows the model to capture temporal dependencies in both forward and backward directions in the sequence. The output from the BiLSTM is then passed through a TimeDistributed Dense layer with a sigmoid activation function, tailored for binary classification of each anomaly type. The model is then compiled with binary cross-entropy loss, suitable for the binary classification task in multi-label settings.
A After training, the model is converted into a format suitable for deployment on a microcontroller, such as an Arduino Uno. This often involves model quantization, which reduces the model's size and computational requirements.

4.4 Deployment on Arduino

4.4.1 Quantization Process.

The floating-point weights and activations of the model are mapped to a lower precision format, typically 8-bit integers. This is done using a linear transformation:
\begin{equation*}q = {\rm{round}}\left( {\frac{r}{S} + Z} \right)\end{equation*}
where \(r\) is the real-valued number, \(q\) is the quantized value, SS is the scale factor, and \(Z\) is the zero-point. Additionally, scale factor and zero-point is calculated during the training or post-training quantization process to minimize the information loss due to quantization. The scale factor ( ) adjusts the range of the quantized values, and the zero-point \(Z\) aligns the quantized range with the real number range.

4.4.2 Model Conversion Tools.

TensorFlow provides tools to convert a trained TensorFlow model into the TensorFlow Lite format, which includes quantization steps. The converter also optimizes the model for efficient execution on devices with limited resources. Deploying the quantized model on an Arduino Uno involves integrating the TensorFlow Lite interpreter into the Arduino environment. TensorFlow Lite for Microcontrollers provides a special interpreter that can run TensorFlow Lite models on microcontrollers with very limited resources.

4.4.3 Memory Allocation.

Due to the limited memory on Arduino Uno, it's crucial to allocate a memory buffer, known as the tensor arena, for the TensorFlow Lite model's operations:
\(TensorArena = {\rm{uint8\_t\ array\ of\ size\ TensorArenaSize}}\)
The size of this buffer, TensorArenaSize, is crucial and needs to be optimized for the specific model and available memory.

4.4.4 Real-Time Inference and Anomaly Detection.

The Arduino continuously reads sensor data and preprocesses it to match the format and scale used during the model's training. This may involve scaling the data and constructing the appropriate input sequences. Here is how the inference is computed:
1.
The preprocessed data is fed into the TensorFlow Lite model for inference.
2.
For each layer l in the model, the computation involves processing the input data a[l] using the quantized weights \({{\bf W}}^{[ {\bf l} ]}\) and biases \({{\bf b}}^{[ {\bf l} ]}\):
\begin{equation*}{{\bf a}}^{\left[ {1 + 1} \right]} = {g}^{\left[ {\bf l} \right]}\left( {{{\bf W}}^{\left[ {\bf l} \right]{{\bf a}}^{\left[ {\bf l} \right]}} + {{\bf b}}^{\left[ {\bf l} \right]}} \right)\end{equation*}
Where [l] is the activation function of layer l. This operation is performed in a quantized manner suitable for the microcontroller's computational capabilities.
The steps for anomaly score calculation and thresholding:
1.
The output from the model represents the probability or likelihood of an anomaly for each monitored feature.
2.
To determine whether an anomaly is present, a threshold θ is set. For each feature \(i\), if the output score \({\bf oi} > \theta\), it is classified as an anomaly:
\({\rm{Anomal}}{{\rm{y}}}_i = 1\quad {\rm{if}}\quad {o}_i > \theta \quad {\rm{else}}\quad 0\)

5 Discussion

Our experiment utilizes a hybrid neural network architecture, trained on synthetic data, to detect anomalies in time-series sensor data. The training dataset constituted 80% of the synthetic data, while the remaining 20% formed the validation set. To ensure the robustness of our model and mitigate potential training bias, we evaluated the model on a separate, held-out test dataset. This test dataset was collected in a distinct session with the machine operator, thereby providing a realistic assessment of the model's generalization capabilities. The model's performance was quantified using standard classification metrics: accuracy, precision, recall, and F1 score. These metrics provide a comprehensive overview of the model's ability to correctly identify anomalies in the test dataset.
1.
Accuracy (0.878): The model achieved an accuracy of 87.8%, indicating a high level of overall correctness in its predictions. This suggests that the model is effective in distinguishing between normal and anomalous states in the majority of cases.
2.
Precision (1.0): With a precision of 100%, the model exhibited exceptional exactness, wherein every instance predicted as an anomaly was indeed an actual anomaly. This is particularly significant in applications where false positives — normal conditions erroneously identified as anomalies — are critical to avoid.
3.
Recall (0.831): The model attained a recall of 83.1%, demonstrating its proficiency in identifying true anomalies. This indicates that the model successfully detected a majority of the anomalous instances, though there is room for improvement in capturing all potential anomalies.
4.
F1 Score (0.908): The F1 score, a harmonic mean of precision and recall, was calculated to be 90.8%. This high score reflects a balanced model that performs well both in terms of precision and recall, indicating its reliability in anomaly detection tasks.

6 Conclusion

This experiment culminated in the successful deployment of a tailored convolutional autoencoder (CAE) combined with a bidirectional long short-term memory (BiLSTM) network for anomaly detection in the BDB 825 diamond dry drilling machine. The integration of these specific neural network architectures into the machine's operational framework, facilitated by Arduino microcontrollers, proved to be a robust solution for real-time monitoring and anomaly identification. The CAE was instrumental in extracting spatial features from the data, while the BiLSTM layer effectively captured the temporal dependencies within the time series data. This dual approach allowed for a comprehensive analysis of the operational patterns and the identification of anomalies that could signify potential equipment malfunctions or safety hazards. The successful execution of this experiment underscores the practicality and effectiveness of using advanced neural network models in industrial settings.

References

[1]
Chuadhry Mujeeb Ahmed, Gauthama Raman M R, and Aditya P. Mathur. "Challenges in Machine Learning based approaches for Real- Time Anomaly Detection in Industrial Control Systems". In Proceedings of the International Conference on Industrial Control Systems Security (ICICSS), 2020, 105-114.
[2]
Yu Jiang, Wei Wang, and Chunhui Zhao. "A Machine Vision-based Realtime Anomaly Detection Method for Industrial Products Using Deep Learning". Journal of Industrial Machine Vision and Applications, 15(3) (2020), 335-345.
[3]
Van Quan Nguyen, Linh Van Ma, Jin-young Kim, Kwangki Kim, and Jinsul Kim. "Applications of Anomaly Detection Using Deep Learning on Time Series Data". IEEE Transactions on Industrial Informatics, 17(4) (2021), 2812-2821.
[4]
Maged Abdelaty, Roberto Doriguzzi-Corin, and Domenico Siracusa. "DAICS: A Deep Learning Solution for Anomaly Detection in Industrial Control Systems". Journal of Network and Systems Management, 28(2) (2020), 345-367.
[5]
Sohrab Mokhtari, Alireza Abbaspour, Kang K. Yen, and Arman Sargolzaei. "A Machine Learning Approach for Anomaly Detection in Industrial Control Systems Based on Measurement Data". IEEE Access, 8 (2020), 99425-99434.
[6]
Kyung Sung Lee, Seong Beom Kim, and Hee-Woong Kim. "Enhanced Anomaly Detection in Manufacturing Processes Through Hybrid Deep Learning Techniques". International Journal of Production Research, 59(7) (2021), 2101-2116.
[7]
Narjes Davari, Bruno Veloso, and Rita P. Ribeiro. "Predictive maintenance based on anomaly detection using deep learning for air production unit in the railway industry". In Proceedings of the IEEE International Conference on Industrial Technology (ICIT), 2021, 564- 569.
[8]
Zhe Li, Jingyue Li, Yi Wang, and Kesheng Wang. "A deep learning approach for anomaly detection based on SAE and LSTM in mechanical equipment". International Journal of Advanced Manufacturing Technology, 102(9-12) (2019), 3987-3999. Certainly, here are the citations for the WSEN-EVAL ISDS sensor and the Sensor Shield for Arduino, formatted according to the provided reference style:
[9]
Würth Elektronik. "WSEN-EVAL ISDS: 6 axis gyroscope and acceleration sensor".https://www.we- online.com/de/components/products/WSEN-EVAL_ISDS.
[11]
Bayram, F. S., Schneider, R., Amin, Md Nur, Radtke, R., Melke, A., & Jesser, A. (2023). "Encoding Techniques on Multivariate Time Series Signals for Failure Prevention of Industrial Assets with Unsupervised Deep Anomaly Detection." In Proceedings of the 6th International Conference on Industrial Cyber-Physical Systems (ICPS).
[12]
Amin, Md Nur, Al Imran, A., Bayram, F. S., & Jesser, A. (2023). "Utilizing Wasserstein GAN with Gradient Penalty for Data Augmentation in LSTM-Autoencoder-Based Anomaly Detection Systems." In Proceedings of the 26th International Conference on Computer and Information Technology (ICCIT)

Index Terms

  1. Real-Time Anomaly Detection with LSTM-Autoencoder Network on Microcontrollers for Industrial Applications

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICGSP '24: Proceedings of the 2024 8th International Conference on Graphics and Signal Processing
    June 2024
    60 pages
    ISBN:9798400717024
    DOI:10.1145/3694875

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 December 2024

    Check for updates

    Author Tags

    1. Anomaly Detection
    2. Industrial Safety
    3. Predictive Maintenance
    4. Real-Time Monitoring

    Qualifiers

    • Research-article

    Conference

    ICGSP 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 49
      Total Downloads
    • Downloads (Last 12 months)49
    • Downloads (Last 6 weeks)49
    Reflects downloads up to 16 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media