Patch-Wise-Based Self-Supervised Learning for Anomaly Detection on Multivariate Time Series Data
<p>Patch-wise learning framework: (<b>Left</b>) representation learning based on self-supervised learning using patching, (<b>Right</b>) supervised learning based on anomaly augmentation.</p> "> Figure 2
<p>Self-supervised learning-based representation learning architecture.</p> "> Figure 3
<p>Supervised learning for anomaly detection.</p> "> Figure 4
<p>Anomaly augmentation.</p> "> Figure 5
<p>Comparison of channel-dependent (CD) and channel-independent (CI) strategies in time series reconstruction.</p> "> Figure 6
<p>Visualization of anomaly datasets.</p> "> Figure 7
<p>(<b>a</b>) F1 score performance based on value embedding in the MSL dataset; (<b>b</b>) F1 score performance based on value embedding in the SMAP dataset; (<b>c</b>) F1 score performance based on value embedding in the SMD dataset.</p> ">
Abstract
:1. Introduction
- Maintaining continuous features through patching: Effectively learns local patterns while maintaining the continuity of time series data, providing better data representation compared to conventional simple sampling methods.
- Incorporating various temporal information by learning channel dependencies and adding relative positional bias: Addresses the issue of missing inter-variable relationships in traditional channel-independent approaches while integrating diverse temporal information.
- Achieving feature representation learning through self-supervised learning: Reduces dependence on labeled data and enables better learning of data feature representations.
- Supervised learning based on anomaly augmentation for downstream tasks: Alleviates the scarcity of anomaly data while allowing the model to learn various types of anomalies effectively.
2. Related Work
Deep Learning in Time Series Anomaly Detection
3. Methods
3.1. Problem Definition
3.2. Self-Supervised Learning-Based Representation Learning
3.3. Anomaly Augmentation-Based Supervised Learning
- Soft replacement: Replaces data with segments from other intervals.
- Uniform replacement: Replaces data within the patch with a constant value.
- Peak noise: Adds noise to specific data segments to create extreme values.
Algorithm 1. Anomaly augmentation-based supervised learning. |
Input: Multivariate time series data Output: Anomaly scores BEGIN Step 1. Patching: Compute number of patches: /patch size) Divide X into non-overlapping patches: Step 2. Random Selection: Randomly select a subset of patches: Step 3. Anomaly Augmentation: Apply anomaly transformations to selected patches: Assign labels: Step 4. 1D Convolution and Embedding: Pass all patches through a 1D convolution layer: ) Step 5. Transformer Encoder: Encode embedding to capture global dependencies: Step 6. Anomaly Score Prediction: Pass encoded embedding through a linear layer: Step 7. Loss Calculation: Compute binary cross-entropy loss: Loss = Binary Cross-entropy loss Step 8. Model Training: Update model parameters by minimizing Loss: = Optimize(Loss) |
3.4. Component
- Patching: Patching involves segmenting multivariate time series data, composed of variables, into patches according to a specified patch size. The original time series data are divided into patches , where each patch time steps, and . While this approach is similar to the sliding window method, we do not use overlapping between patches in our framework. The patching mechanism is particularly advantageous for handling high-dimensional datasets with a large number of variables () and long temporal lengths (). By dividing time series data into smaller, more manageable patches, the framework significantly reduces computational overhead. Using non-overlapping patch segmentation further optimizes computational costs. For high-dimensional datasets, the computational complexity of direct modeling tends to grow exponentially with the size of the input. However, with patching, the complexity scales linearly with the number of patches () and the patch size (), making the framework highly scalable. The flexibility of the framework, allowing for adjustments in both patch size and overlap, enables a balance between computational efficiency and representational capacity. This ensures its applicability to both small-scale and large-scale time series data. Consequently, the framework is a robust choice for real-world applications involving high-dimensional and long-sequence datasets.
- Channel dependency: Channel independency treats each variable individually, analyzing them as independent entities. This approach is robust to the typical distribution shift challenges inherent in time series data [18]. However, for tasks like anomaly detection, where the relationships among variables are critical, it is essential to account for channel dependency during learning. This is because interactions among variables in time series data can serve as important clues for detecting anomalous patterns. For example, when the value of one variable changes drastically, its relationship with other variables can help determine whether this change is normal or indicative of an anomaly. Ignoring such interdependencies increases the risk of overlooking significant anomalous patterns, which can lead to reduced detection accuracy. By incorporating channel dependency into the learning process, the complex relationships among variables can be effectively reflected, enabling more accurate anomaly detection. Figure 5 illustrates a comparison between channel-independent and channel-dependent strategies. This framework adopts a channel dependency strategy to effectively capture and learn inter-variable relationships.
- One-dimensional convolution-based value embedding: The value embedding in a transformer model is directly related to the three key matrices used in the self-attention mechanism: queries, keys, and values [50]. Since it plays a crucial role in the attention computation—the core mechanism of transformer models—it is essential to optimize its generation process. Traditionally, MLPs (multi-layer perceptrons) are used to generate the value embedding. However, MLPs do not explicitly capture positional or sequential correlations among different parts of the input data. To address this limitation, we utilized a 1D convolution-Based value embedding approach. Using 1D convolution allows the model to learn the relationships among adjacent values at each time step, effectively capturing local patterns. Additionally, since 1D convolution learns weights dynamically, it can adapt to the data and incorporate positional information. This embedding is represented in a the space. Furthermore, to incorporate dynamic positional information, relative positional encoding is applied within the layer, enhancing the representation of position-dependent features.
- Transformer encoder: The transformer encoder used in the proposed framework is based on the Vanilla Transformer Encoder and serves as the backbone network for pre-training [50]. The projected embedding from the previous step is input into the transformer encoder to learn the complex dependencies across the time series. This process employs multi-head self-attention (MHSA) to capture correlations among patches, with batch normalization applied. The attention for each head is calculated as follows:
3.5. Loss Function
- Time loss: This measures the difference between the restored patch and the actual values using the mean squared error (MSE). This serves as a measure of temporal consistency in the time series data.
- Frequency loss: After transforming the time series data into the frequency domain using Fourier transformation, this loss measures the spectral difference between the restored patch and the actual values. It considers both high-frequency and low-frequency components, aiding the model in accurately restoring overall patterns. Although extensively studied in the field of time series forecasting, more research is needed in the anomaly detection domain [16,17,29,42].
- Anomaly loss: The model trained based on Equation (1) calculates anomaly loss using binary cross-entropy. The input data are divided into patches, and patches are randomly selected for anomaly augmentation. The model then learns the anomaly label for each augmented patch . Based on the predicted label , an anomaly score is derived by averaging. The model is then trained to minimize a binary cross-entropy loss function, as shown in Equation (7), based on this anomaly score.
4. Experiments
- Mars Science Laboratory (MSL) dataset: The MSL dataset consists of sensor data collected from a NASA spacecraft and includes 55 telemetry channels. It is expertly labeled with anomaly data [51].
- Soil Moisture Active Passive (SMAP) dataset: The SMAP dataset, like the MSL dataset, is a NASA-labeled dataset of soil samples and telemetry information collected by the Mars rover. Anomalous conditions were labeled by experts [51].
- Server Machine Dataset (SMD) dataset: The SMD dataset is a server dataset collected over five weeks by a major internet company. The server data come from 28 different machines [52].
4.1. Settings
4.2. Results of Experiment
4.3. Effectiveness of Value Embedding
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Nguyen, D.K.; Sermpinis, G.; Stasinakis, C. Big Data, Artificial Intelligence and Machine Learning: A Transformative Symbiosis in Favour of Financial Technology. Euro. Fin. Manag. 2023, 29, 517–548. [Google Scholar] [CrossRef]
- Ao, S.-I.; Fayek, H. Continual Deep Learning for Time Series Modeling. Sensors 2023, 23, 7167. [Google Scholar] [CrossRef] [PubMed]
- Fan, J.; Liu, Z.; Wu, H.; Wu, J.; Si, Z.; Hao, P.; Luan, T.H. LUAD: A Lightweight Unsupervised Anomaly Detection Scheme for Multivariate Time Series Data. Neurocomputing 2023, 557, 126644. [Google Scholar] [CrossRef]
- Kim, B.; Alawami, M.A.; Kim, E.; Oh, S.; Park, J.; Kim, H. A Comparative Study of Time Series Anomaly Detection Models for Industrial Control Systems. Sensors 2023, 23, 1310. [Google Scholar] [CrossRef]
- Mejri, N.; Lopez-Fuentes, L.; Roy, K.; Chernakov, P.; Ghorbel, E.; Aouada, D. Unsupervised Anomaly Detection in Time-Series: An Extensive Evaluation and Analysis of State-of-the-Art Methods. Expert Syst. Appl. 2024, 256, 124922. [Google Scholar] [CrossRef]
- Braei, M.; Wagner, S. Anomaly Detection in Univariate Time-Series: A Survey on the State-of-the-Art. arXiv 2020, arXiv:2004.00433. [Google Scholar]
- Pincombe, B. Anomaly Detection in Time Series of Graphs Using ARMA Processes. Bull. Am. Soc. Overseas Res. 2005, 24, 2. [Google Scholar]
- Kozitsin, V.; Katser, I.; Lakontsev, D. Online Forecasting and Anomaly Detection Based on the ARIMA Model. Appl. Sci. 2021, 11, 3194. [Google Scholar] [CrossRef]
- Barrientos-Torres, D.; Martinez-Ríos, E.A.; Navarro-Tuch, S.A.; Pablos-Hach, J.L.; Bustamante-Bello, R. Water Flow Modeling and Forecast in a Water Branch of Mexico City through ARIMA and Transfer Function Models for Anomaly Detection. Water 2023, 15, 2792. [Google Scholar] [CrossRef]
- Xu, H.; Sun, Z.; Cao, Y.; Bilal, H. A Data-Driven Approach for Intrusion and Anomaly Detection Using Automated Machine Learning for the Internet of Things. Soft. Comput. 2023, 27, 14469–14481. [Google Scholar] [CrossRef]
- Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. Int. J. Forecast. 2020, 36, 1181–1191. [Google Scholar] [CrossRef]
- Lai, G.; Chang, W.-C.; Yang, Y.; Liu, H. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 95–104. [Google Scholar] [CrossRef]
- Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal Convolutional Networks for Action Segmentation and Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 21–26 July 2017; pp. 1003–1012. [Google Scholar] [CrossRef]
- Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. SCINet: Time Series Modeling and Forecasting with Sample Convolution and Interaction. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the The Thirty-Fifth AAAI Conference on Artificial Intelligence, Virtual Conference, 2–9 February 2021; 35, pp. 11106–11115. [Google Scholar] [CrossRef]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Proceedings of the 35th Annual Conference on Neural Information Processing Systems (NeurIPS 2021), Virtual Conference, 6–14 December 2021. [Google Scholar]
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
- Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A Time Series is Worth 64 Words: Long-Term Forecasting with Transformers. arXiv 2023, arXiv:2211.14730. [Google Scholar]
- Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-Beats: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting. arXiv 2020, arXiv:1905.1043. [Google Scholar]
- Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? arXiv 2022, arXiv:2205.13504. [Google Scholar]
- Jin, M.; Koh, H.Y.; Wen, Q.; Zambon, D.; Alippi, C.; Webb, G.I.; King, I.; Pan, S. A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10466–10485. [Google Scholar] [CrossRef]
- Iqbal, A.; Amin, R. Time Series Forecasting and Anomaly Detection Using Deep Learning. Comput. Chem. Eng. 2024, 182, 108560. [Google Scholar] [CrossRef]
- Cui, Q.D.; Xu, C.; Xu, Y.; Ou, W.; Pang, Y.; Liu, Z.; Shen, J.; Baber, M.Z.; Maharajan, C.; Ghosh, U. Bifurcation and Controller Design of 5D BAM Neural Networks With Time Delay. Int. J. Numer. Model. 2024, 37, e3316. [Google Scholar] [CrossRef]
- Maharajan, C.; Sowmiya, C.; Xu, C. Delay Dependent Complex-Valued Bidirectional Associative Memory Neural Networks with Stochastic and Impulsive Effects: An Exponential Stability Approach. Kybernetika 2024, 60, 317–356. [Google Scholar] [CrossRef]
- He, Y.; Zhao, J. Temporal Convolutional Networks for Anomaly Detection in Time Series. J. Phys.: Conf. Ser. 2019, 1213, 042050. [Google Scholar] [CrossRef]
- Li, X.; Chen, Y.; Zhang, X.; Peng, Y.; Zhang, D.; Chen, Y. ConvTrans-CL: Ocean Time Series Temperature Data Anomaly Detection Based Context Contrast Learning. Appl. Ocean. Res. 2024, 150, 104122. [Google Scholar] [CrossRef]
- Xu, J.; Wu, H.; Wang, J.; Long, M. Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. arXiv 2022, arXiv:2110.02642. [Google Scholar]
- Wang, D.; Shang, Y. A New Active Labeling Method for Deep Learning. In Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN), Beijing, China, 6–11 July 2014; pp. 112–119. [Google Scholar] [CrossRef]
- Oh, S.; Ashiquzzaman, A.; Lee, D.; Kim, Y.; Kim, J. Study on Human Activity Recognition Using Semi-Supervised Active Transfer Learning. Sensors 2021, 21, 2760. [Google Scholar] [CrossRef] [PubMed]
- An, J.; Cho, S. Variational Autoencoder Based Anomaly Detection Using Reconstruction Probability, Technical Report; SNU Data Mining Center: Seoul, Republic of Korea, 2015. [Google Scholar]
- Zhang, C.; Zhou, T.; Wen, Q.; Sun, L. TFAD: A Decomposition Time Series Anomaly Detection Architecture with Time-Frequency Analysis. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 2497–2507. [Google Scholar] [CrossRef]
- Yi, K.; Zhang, Q.; Cao, L.; Wang, S.; Long, G.; Hu, L.; He, H.; Niu, Z.; Fan, W.; Xiong, H. A Survey on Deep Learning Based Time Series Analysis with Frequency Transformation. arXiv 2023, arXiv:2302.02173. [Google Scholar]
- Gao, B.; Ma, H.-Y.; Yang, Y.-H. HMMs (Hidden Markov Models) Based on Anomaly Intrusion Detection Method. In Proceedings of the International Conference on Machine Learning and Cybernetics, Beijing, China, 4–5 November 2002; Volume 1, pp. 381–385. [Google Scholar] [CrossRef]
- Gao, J.; Song, X.; Wen, Q.; Wang, P.; Sun, L.; Xu, H. RobustTAD: Robust Time Series Anomaly Detection via Decomposition and Convolutional Neural Networks. arXiv 2021, arXiv:2002.09545. [Google Scholar]
- Paparrizos, J.; Kang, Y.; Boniol, P.; Tsay, R.S.; Palpanas, T.; Franklin, M.J. TSB-UAD: An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection. Proc. VLDB Endow. 2022, 15, 1697–1711. [Google Scholar] [CrossRef]
- Zhao, H.; Wang, Y.; Duan, J.; Huang, C.; Cao, D.; Tong, Y.; Xu, B.; Bai, J.; Tong, J.; Zhang, Q. Multivariate Time-Series Anomaly Detection via Graph Attention Network. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 841–850. [Google Scholar] [CrossRef]
- Wong, L.; Liu, D.; Berti-Equille, L.; Alnegheimish, S.; Veeramachaneni, K. AER: Auto-Encoder with Regression for Time Series Anomaly Detection. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; pp. 1152–1161. [Google Scholar] [CrossRef]
- Zong, B.; Song, Q.; Min, M.R.; Cheng, W.; Lumezanu, C.; Cho, D.; Chen, H. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. In Proceedings of the Sixth International Conference on Learning Representations (ICLR), Vancouver, Canada, 30 April–3 May 2018. [Google Scholar]
- Park, D.; Hoshi, Y.; Kemp, C.C. A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-Based Variational Autoencoder. IEEE Robot. Autom. Lett. 2018, 3, 1544–1551. [Google Scholar] [CrossRef]
- Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Langs, G.; Schmidt-Erfurth, U. F-AnoGAN: Fast Unsupervised Anomaly Detection with Generative Adversarial Networks. Med. Image Anal. 2019, 54, 30–44. [Google Scholar] [CrossRef]
- Geiger, A.; Liu, D.; Alnegheimish, S.; Cuesta-Infante, A.; Veeramachaneni, K. TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 33–43. [Google Scholar] [CrossRef]
- Jia, W.; Shukla, R.M.; Sengupta, S. Anomaly Detection Using Supervised Learning and Multiple Statistical Methods. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 1291–1297. [Google Scholar] [CrossRef]
- Jeong, Y.; Yang, E.; Ryu, J.H.; Park, I.; Kang, M. AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly Detection Using Data Degradation Scheme. arXiv 2023, arXiv:2305.04468. [Google Scholar]
- Yi, K.; Zhang, Q.; Fan, W.; Wang, S.; Wang, P.; He, H.; Lian, D.; An, N.; Cao, L.; Niu, Z. Frequency-Domain MLPs Are More Effective Learners in Time Series Forecasting. In Proceedings of the 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. arXiv 2024, arXiv:2310.06625. [Google Scholar]
- Zhong, Z.; Yu, Z.; Yang, Y.; Wang, W.; Yang, K. PatchAD: A Lightweight Patch-Based MLP-Mixer for Time Series Anomaly Detection. arXiv 2024, arXiv:2401.09793. [Google Scholar]
- Zhang, H.; Li, F.; Xu, H.; Huang, S.; Liu, S.; Ni, L.M.; Zhang, L. MP-Former: Mask-Piloted Transformer for Image Segmentation. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 18074–18083. [Google Scholar] [CrossRef]
- Das, A.; Kong, W.; Sen, R.; Zhou, Y. A Decoder-Only Foundation Model for Time-Series Forecasting. arXiv 2024, arXiv:2310.10688. [Google Scholar]
- Yan, P.; Abdulkadir, A.; Luley, P.-P.; Rosenthal, M.; Schatte, G.A.; Grewe, B.F.; Stadelmann, T. A Comprehensive Survey of Deep Transfer Learning for Anomaly Detection in Industrial Time Series: Methods, Applications, and Directions. IEEE Access 2024, 12, 3768–3789. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31th Annual Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar] [CrossRef]
- Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar] [CrossRef]
- Zhang, C.; Song, D.; Chen, Y.; Feng, X.; Lumezanu, C.; Cheng, W.; Ni, J.; Zong, B.; Chen, H.; Chawla, N.V. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1409–1416. [Google Scholar] [CrossRef]
- Shen, L.; Li, Z.; Kwok, J.T. Timeseries Anomaly Detection Using Temporal Hierarchical One-Class Network. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual Conference, 6–12 December 2020. [Google Scholar]
- Audibert, J.; Michiardi, P.; Guyard, F.; Marti, S.; Zuluaga, M.A. USAD: UnSupervised Anomaly Detection on Multivariate Time Series. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 3395–3404. [Google Scholar] [CrossRef]
- Deng, A.; Hooi, B. Graph Neural Network-Based Anomaly Detection in Multivariate Time Series. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, Virtual Conference, 2–9 February 2021; Volume 35, pp. 4027–4035. [Google Scholar] [CrossRef]
- Kim, S.; Choi, K.; Choi, H.-S.; Lee, B.; Yoon, S. Towards a Rigorous Evaluation of Time-Series Anomaly Detection. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, Virtual Conference, 22 February–1 March 2022; Volume 36, pp. 7194–7201. [Google Scholar] [CrossRef]
Dataset | Number of Features | Number of Entities | Training Size | Test Size |
---|---|---|---|---|
MSL | 55 | 27 | 58,317 | 73,729 |
SMAP | 25 | 55 | 135,183 | 427,617 |
SMD | 38 | 28 | 708,405 | 708,420 |
Setting | Pre-Train | Downstream |
---|---|---|
Task | Mask Modeling | Anomaly Detection |
Patch Size | 2(MSL), 4(SMAP, SMD) | 2(MSL), 4(SMAP, SMD) |
Masking Ratio | 0.4 | No |
Batch Size | 16 | 16 |
Learning Type | Self-Supervised Learning | Supervised Learning |
Anomaly Augmentation | No | Yes |
Percent of Anomaly Augmentation | - | Soft Replacement (50%), Uniform Replacement (15%), Peak Noise (15%) |
Model | MSL | SMAP | SMD | |||
---|---|---|---|---|---|---|
F1 | F1 | FA | ||||
DAGMM | 0.199 | 0.701 | 0.333 | 0.712 | 0.238 | 0.723 |
LSTM-VAE | 0.212 | 0.678 | 0.235 | 0.756 | 0.435 | 0.808 |
OmniAnomaly | 0.207 | 0.899 | 0.227 | 0.805 | 0.474 | 0.944 |
MSCRED | 0.199 | 0.775 | 0.232 | 0.945 | 0.097 | 0.389 |
THOC | 0.190 | 0.891 | 0.240 | 0.781 | 0.168 | 0.541 |
USAD | 0.211 | 0.927 | 0.228 | 0.818 | 0.426 | 0.938 |
GDN | 0.217 | 0.903 | 0.252 | 0.708 | 0.529 | 0.716 |
AnomalyBERT | 0.302 | 0.585 | 0.457 | 0.914 | 0.535 | 0.830 |
AnomalyBERT* | 0.318 | 0.730 | 0.340 | 0.929 | 0.255 | 0.705 |
Ours | 0.390 | 0.818 | 0.413 | 0.842 | 0.337 | 0.725 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Oh, S.; Anh, L.H.; Vu, D.T.; Yu, G.H.; Hahn, M.; Kim, J. Patch-Wise-Based Self-Supervised Learning for Anomaly Detection on Multivariate Time Series Data. Mathematics 2024, 12, 3969. https://doi.org/10.3390/math12243969
Oh S, Anh LH, Vu DT, Yu GH, Hahn M, Kim J. Patch-Wise-Based Self-Supervised Learning for Anomaly Detection on Multivariate Time Series Data. Mathematics. 2024; 12(24):3969. https://doi.org/10.3390/math12243969
Chicago/Turabian StyleOh, Seungmin, Le Hoang Anh, Dang Thanh Vu, Gwang Hyun Yu, Minsoo Hahn, and Jinsul Kim. 2024. "Patch-Wise-Based Self-Supervised Learning for Anomaly Detection on Multivariate Time Series Data" Mathematics 12, no. 24: 3969. https://doi.org/10.3390/math12243969
APA StyleOh, S., Anh, L. H., Vu, D. T., Yu, G. H., Hahn, M., & Kim, J. (2024). Patch-Wise-Based Self-Supervised Learning for Anomaly Detection on Multivariate Time Series Data. Mathematics, 12(24), 3969. https://doi.org/10.3390/math12243969