[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

AllInfoLog: Robust Diverse Anomalies Detection Based on All Log Features

Published: 01 September 2023 Publication History

Abstract

Large-scale services are generating massive logs, which trace the runtime states and critical events. Anomaly detection via logs is critical for service maintenance and reliability assurance. Existing log-based anomaly detection methods make use of the limited information in log data, resulting in their incapability of detecting diverse anomalies related to unused log features. In this paper, we propose AllInfoLog, a robust log-based anomaly detection method taking advantage of all log information, to detect diverse types of anomalies. To capture all log features, AllInfoLog utilizes four encoders to extract semantic, parameter, time, and other feature embeddings, respectively. The embeddings of all log features are then combined to train an attention-based Bi-LSTM model to detect diverse anomalies. The experimental evaluations on real-world log datasets, synthetic datasets, and unstable log datasets demonstrate AllInfoLog outperforms the state-of-the-art log-based anomaly detection methods from aspects of performance and robustness, and has effectiveness to detect diverse types of anomalies.

References

[1]
D. El-Masri, F. Petrillo, Y.-G. Guéhéneuc, A. Hamou-Lhadj, and A. Bouziane, “A systematic literature review on automated log abstraction techniques,” Inf. Softw. Technol., vol. 122, Jun. 2020, Art. no.
[2]
H. Mi, H. Wang, Y. Zhou, M. R.-T. Lyu, and H. Cai, “Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems,” IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 6, pp. 1245–1255, Jun. 2013.
[3]
S. He, P. He, Z. Chen, T. Yang, Y. Su, and M. R. Lyu, “A survey on automated log analysis for reliability engineering,” ACM Comput. Surveys, vol. 54, no. 6, pp. 130:1–130:37, Jul. 2021.
[4]
S. Zhanget al., “FUNNEL: Assessing software changes in Web-based services,” IEEE Trans. Services Comput., vol. 11, no. 1, pp. 34–48, Jan./Feb. 2018.
[5]
S. Satpathi, S. Deb, R. Srikant, and H. Yan, “Learning latent events from network message logs,” IEEE/ACM Trans. Netw., vol. 27, no. 4, pp. 1728–1741, Aug. 2019.
[6]
S. Zhanget al., “PreFix: Switch failure prediction in datacenter networks,” Proc. ACM Meas. Anal. Comput. Syst., vol. 2, no. 1, pp. 2:1–2:29, Apr. 2018.
[7]
W. Menget al., “Device-agnostic log anomaly classification with partial labels,” in Proc. IEEE/ACM 26th Int. Symp. Qual. Service (IWQoS), Jun. 2018, pp. 1–6.
[8]
S. He, Q. Lin, J.-G. Lou, H. Zhang, M. R. Lyu, and D. Zhang, “Identifying impactful service system problems via log analysis,” in Proc. 26th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., New York, NY, USA, Oct. 2018, pp. 60–70.
[9]
S. Khatuya, N. Ganguly, J. Basak, M. Bharde, and B. Mitra, “ADELE: Anomaly detection from event log empiricism,” in Proc. IEEE Conf. Comput. Commun., Apr. 2018, pp. 2114–2122.
[10]
M. R. Lyu, Handbook of Software Reliability Engineering. Los Alamitos, CA, USA: IEEE Computer Soc. Press, 1996.
[11]
B. Chen and Z. M. Jiang, “Characterizing logging practices in java-based open source software projects—A replication study in apache software foundation,” Empir. Softw. Eng., vol. 22, no. 1, pp. 330–374, Feb. 2017.
[12]
M. Du, F. Li, G. Zheng, and V. Srikumar, “DeepLog: Anomaly detection and diagnosis from system logs through deep learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. Security, New York, NY, USA, Oct. 2017, pp. 1285–1298.
[13]
X. Zhanget al., “Robust log-based anomaly detection on unstable log data,” in Proc. 27th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., New York, NY, USA, Aug. 2019, pp. 807–817.
[14]
W. Menget al., “LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs,” in Proc. 28th Int. Joint Conf. Artif. Intell., Macao, China, Aug. 2019, pp. 4739–4745.
[15]
S. Lu, X. Wei, Y. Li, and L. Wang, “Detecting anomaly in big data system logs using convolutional neural network,” in Proc. IEEE 16th Int. Conf. Dependable, Auton. Secure Comput., 16th Int. Conf Pervasive Intell. Comput., 4th Int. Conf. Big Data Intell. Comput. Cyber Sci. Technol. Congr. (DASC/PiCom/DataCom/CyberSciTech), Aug. 2018, pp. 151–158.
[16]
X. Li, P. Chen, L. Jing, Z. He, and G. Yu, “SwissLog: Robust and unified deep learning based log anomaly detection for diverse faults,” in Proc. IEEE 31st Int. Symp. Softw. Rel. Eng. (ISSRE), Oct. 2020, pp. 92–103.
[17]
S. Huanget al., “HitAnomaly: Hierarchical transformers for anomaly detection in system log,” IEEE Trans. Netw. Service Manag., vol. 17, no. 4, pp. 2064–2076, Dec. 2020.
[18]
W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale system problems by mining console logs,” in Proc. ACM SIGOPS 22nd Symp. Oper. Syst. Princ., New York, NY, USA, Oct. 2009, pp. 117–132.
[19]
J.-G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li, “Mining invariants from console logs for system problem detection,” in Proc. USENIX Conf. USENIX Annu. Tech. Conf., Boston, MA, USA, Jun. 2010, p. 24.
[20]
M. Farshchi, J.-G. Schneider, I. Weber, and J. Grundy, “Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis,” in Proc. IEEE 26th Int. Symp. Softw. Rel. Eng. (ISSRE), Nov. 2015, pp. 24–34.
[21]
S. He, J. Zhu, P. He, and M. R. Lyu, “Experience report: System log analysis for anomaly detection,” in Proc. IEEE 27th Int. Symp. Softw. Rel. Eng. (ISSRE), Oct. 2016, pp. 207–218.
[22]
W. Menget al., “A semantic-aware representation framework for online log analysis,” in Proc. 29th Int. Conf. Comput. Commun. Netw. (ICCCN), Aug. 2020, pp. 1–7.
[23]
L. Yanget al., “PLELog: Semi-supervised log-based anomaly detection via probabilistic label estimation,” in Proc. IEEE/ACM 43rd Int. Conf. Softw. Eng. Compan. (ICSE-Companion), May 2021, pp. 230–231.
[24]
P. He, Z. Chen, S. He, and M. R. Lyu, “Characterizing the natural language descriptions in software logging statements,” in Proc. 33rd ACM/IEEE Int. Conf. Autom. Softw. Eng., New York, NY, USA, Sep. 2018, pp. 178–189.
[25]
Y. Liuet al., “RoBERTa: A robustly optimized BERT pretraining approach,” 2019, arXiv:1907.11692.
[26]
A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., May 2013, pp. 6645–6649.
[27]
Z. Huang, W. Xu, and K. Yu, “Bidirectional LSTM-CRF models for sequence tagging,” 2015, arXiv:1508.01991.
[28]
A. Oliner and J. Stearley, “What supercomputers say: A study of five system logs,” in Proc. 37th Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw. (DSN), Jun. 2007, pp. 575–584.
[29]
S. He, J. Zhu, P. He, and M. R. Lyu, “Loghub: A large collection of system log datasets towards automated log analytics,” 2020, arXiv:2008.06448.
[30]
J. Zhuet al., “Tools and benchmarks for automated log parsing,” in Proc. IEEE/ACM 41st Int. Conf. Softw. Eng.: Softw. Eng. Pract. (ICSE-SEIP), May 2019, pp. 121–130.
[31]
X. Yu, S. Han, D. Zhang, and T. Xie, “Comprehending performance from real-world execution traces: A device-driver case,” in Proc. 19th Int. Conf. Archit. Support Program. Lang. Oper. Syst., New York, NY, USA, Feb. 2014, pp. 193–206.
[32]
T. Mizouchi, K. Shimari, T. Ishio, and K. Inoue, “PADLA: A dynamic log level adapter using online phase detection,” in Proc. IEEE/ACM 27th Int. Conf. Program Comprehension (ICPC), May 2019, pp. 135–138.
[33]
A. Das, F. Mueller, C. Siegel, and A. Vishnu, “Desh: Deep learning for system health prediction of lead times to failure in HPC,” in Proc. 27th Int. Symp. High-Perform. Parallel Distrib. Comput., New York, NY, USA, Jun. 2018, pp. 40–51.
[34]
X. Zhao, K. Rodrigues, Y. Luo, D. Yuan, and M. Stumm, “Non-intrusive performance profiling for entire software stacks based on the flow reconstruction principle,” in Proc. 12th USENIX Symp. Oper. Syst. Design Implement., 2016, pp. 603–618.
[35]
OpenStack Mitaka.” Accessed: Nov. 18, 2021. [Online]. Available: https://www.openstack.org/software/mitaka/
[36]
OpenStack victoria.” Accessed: Nov. 18, 2021. [Online]. Available: https://www.openstack.org/software/victoria/
[37]
Q. Lin, H. Zhang, J.-G. Lou, Y. Zhang, and X. Chen, “Log clustering based problem identification for online service systems,” in Proc. IEEE/ACM 38th Int. Conf. Softw. Eng. Compan. (ICSE-C), May 2016, pp. 102–111.
[38]
P. He, J. Zhu, S. He, J. Li, and M. R. Lyu, “Towards automated log parsing for large-scale log data analysis,” IEEE Trans. Dependable Secure Comput., vol. 15, no. 6, pp. 931–944, Nov./Dec. 2018.
[39]
Q. Fu, J.-G. Lou, Y. Wang, and J. Li, “Execution anomaly detection in distributed systems through unstructured log analysis,” in Proc. 9th IEEE Int. Conf. Data Min., Dec. 2009, pp. 149–158.
[40]
M. Du and F. Li, “Spell: Online streaming parsing of large unstructured system logs,” IEEE Trans. Knowl. Data Eng., vol. 31, no. 11, pp. 2213–2227, Nov. 2019.
[41]
P. He, J. Zhu, Z. Zheng, and M. R. Lyu, “Drain: An online log parsing approach with fixed depth tree,” in Proc. IEEE Int. Conf. Web Services (ICWS), Jun. 2017, pp. 33–40.
[42]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proc. Adv. Neural Inf. Process. Syst., vol. 26, 2013, Art. no.
[43]
Y. Zuo, Y. Wu, G. Min, C. Huang, and K. Pei, “An intelligent anomaly detection scheme for micro-services architectures with temporal and spatial data analysis,” IEEE Trans. Cogn. Commun. Netw., vol. 6, no. 2, pp. 548–561, Jun. 2020.
[44]
M. Peterset al., “Deep Contextualized word representations,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguist., 2018, pp. 2227–2237.
[45]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” 2019, arXiv:1810.04805.
[46]
Z. Liu, Y. Lin, and M. Sun, Representation Learning for Natural Language Processing. Singapore: Springer, 2020.
[47]
A. Radfordet al., Language Models are Unsupervised Multitask Learners, OpenAI Blog, San Francisco, CA, USA, 2019, p. 24.
[48]
Roberta-base hugging face.” Accessed: Nov. 19, 2021. [Online]. Available: https://huggingface.co/roberta-base
[49]
H. Xiao. “Bert-as-service.” 2018. [Online]. Available: https://github.com/hanxiao/bert-as-service
[50]
C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The stanford CoreNLP natural language processing toolkit,” in Proc. 52nd Annu. Meeting Assoc. Comput. Linguist. Syst. Demonstrations, 2014, pp. 55–60.
[51]
M. Farshchi, J.-G. Schneider, I. Weber, and J. Grundy, “Metric selection and anomaly detection for cloud operations using log and metric correlation analysis,” J. Syst. Softw., vol. 137, pp. 531–549, Mar. 2018.
[52]
Log4j—Apache Log4j 2.” 2022. [Online]. Available: https://logging.apache.org/log4j/2.x/
[53]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[54]
J. L. Elman, “Finding structure in time,” Cogn. Sci., vol. 14, no. 2, pp. 179–211, 1990.
[55]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2017, arXiv:1412.6980.
[56]
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015.
[57]
L. McInnes, J. Healy, and S. Astels, “HDBSCAN: Hierarchical density based clustering,” J. Open Source Softw., vol. 2, no. 11, p. 205, 2017.
[58]
A. Paszkeet al., “Automatic differentiation in PyTorch,” in Proc. 31st Conf. Neural Inf. Process. Syst. Workshop, Oct. 2017.
[59]
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” in Proc. 15th Conf. Eur. Chapter Assoc. Comput. Linguist., Valencia, Spain, Apr. 2017, pp. 427–431.
[60]
G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Inf. Process. Manage., vol. 24, no. 5, pp. 513–523, 1988.
[61]
M. Cinque, D. Cotroneo, and A. Pecchia, “Event logs for the analysis of software failures: A rule-based approach,” IEEE Trans. Softw. Eng., vol. 39, no. 6, pp. 806–821, Jun. 2013.
[62]
A. Oprea, Z. Li, T.-F. Yen, S. H. Chin, and S. Alrwais, “Detection of early-stage enterprise infection by mining large-scale log data,” in Proc. 45th Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw., Jun. 2015, pp. 45–56.

Cited By

View all
  • (2024)ContexLog: Non-Parsing Log Anomaly Detection With All Information Preservation and Enhanced Contextual RepresentationIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340028321:4(4750-4762)Online publication date: 13-May-2024

Index Terms

  1. AllInfoLog: Robust Diverse Anomalies Detection Based on All Log Features
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Network and Service Management
        IEEE Transactions on Network and Service Management  Volume 20, Issue 3
        Sept. 2023
        1837 pages

        Publisher

        IEEE Press

        Publication History

        Published: 01 September 2023

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)ContexLog: Non-Parsing Log Anomaly Detection With All Information Preservation and Enhanced Contextual RepresentationIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340028321:4(4750-4762)Online publication date: 13-May-2024

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media