Online log parsing using evolving research tree

Arthur Vervaet ORCID: orcid.org/0000-0003-3526-3364^1,2,
Mar Callau-Zori¹,
Yousra Chabchoub² &
…
Raja Chiky¹

318 Accesses
2 Citations
Explore all metrics

Abstract

Logs are a reliable source of information for development and maintenance purposes. They record information at runtime regarding the state of a system and are commonly used to analyze its behavior. Parsing operations on logs structure the information embedded within the log message and are a crucial step for many log mining applications. In such use cases, parsing effectiveness can impact performance. For systems that require real-time performance, parsing efficiency is also an important factor. In this paper, we present USTEP, an online log parser that uses an evolving tree structure to encode and discover new parsing rules on the fly. Our evaluation of 14 datasets from different logging environments highlights the superiority of our method in terms of robustness and effectiveness compared to the state of the art. Our analysis of space and time complexity shows that USTEP is the only considered method capable of processing logs in constant time regardless of their length. We also propose here USTEP-UP, a way of running multiple USTEP instances in parallel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

PLQ: An Efficient Approach to Processing Pattern-Based Log Queries

Article 30 September 2022

Studying and detecting log-related issues

Article 15 March 2018

Slop: Towards an Efficient and Universal Streaming Log Parser

Notes

References

Vervaet A, Chiky, R, Callau-Zori M (2021) Ustep: unfixed search tree for efficient log parsing. In: 2021 IEEE international conference on data mining (ICDM)
Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I et al (2010) A view of cloud computing. Commun ACM 53(4):50–58
Article Google Scholar
Gartner (2021) 3 Cloud disciplines to fuel digital innovation. https://www.gartner.com/smarterwithgartner/3-cloud-disciplines-to-fuel-digital-innovation
Varghese B, Buyya R (2018) Next generation cloud computing: new trends and research directions. Futur Gener Comput Syst 79:849–861
Article Google Scholar
He S, He P, Chen Z, Yang T, Su Y, Lyu MR (2021) A survey on automated log analysis for reliability engineering. ACM Comput Surv (CSUR) 54(6):1–37
Article Google Scholar
Zeng L, Xiao Y, Chen H, Sun B, Han W (2016) Computer operating system logging and security issues: a survey. Secur Commun Netw 9(17):4804–4821
Article Google Scholar
Mi H, Wang H, Zhou Y, Lyu MR-T, Cai H (2013) Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems. IEEE Trans Parallel Distrib Syst 24(6):1245–1255
Article Google Scholar
Liang H, Song L, Wang J, Guo L, Li X, Liang J (2021) Robust unsupervised anomaly detection via multi-time scale dcgans with forgetting mechanism for industrial multivariate time series. Neurocomputing 423:444–462
Article Google Scholar
Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 1285–1298
Zhu J, He S, Liu J, He P, Xie Q, Zheng Z, Lyu MR (2019) Tools and benchmarks for automated log parsing. In: 2019 IEEE/ACM 41st international conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 121–130
Mizutani M (2013) Incremental mining of system log format. In: 2013 IEEE international conference on services computing. IEEE, pp 595–602
Shima K (2016) Length matters: clustering system log messages using length of words. arXiv preprint arXiv:1611.03213
Du M, Li F (2016) Spell: streaming parsing of system event logs. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 859–864
He P, Zhu J, Zheng Z, Lyu MR (2017) Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE international conference on web services (ICWS). IEEE, pp 33–40
Fu Q, Lou J-G, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: 2009 Ninth IEEE international conference on data mining. IEEE, pp 149–158
Tang L, Li T, Perng C-S (2011) Logsig: Generating system events from raw textual logs. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 785–794
Hamooni H, Debnath B, Xu J, Zhang H, Jiang G, Mueen A (2016) Logmine: fast pattern recognition for log analytics. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp 1573–1582
Makanju AA, Zincir-Heywood AN, Milios EE (2009) Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1255–1264
Vaarandi R (2003) A data clustering algorithm for mining patterns from event logs. In: Proceedings of the 3rd ieee workshop on IP operations & management (IPOM 2003)(IEEE Cat. No. 03EX764). IEEE, pp 119–126
Nagappan M, Vouk MA (2010) Abstracting log lines to log event types for mining software system logs. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). IEEE, pp 114–117
Vaarandi R, Pihelgas M (2015) Logcluster-a data clustering and pattern mining algorithm for event logs. In: 2015 11th international conference on network and service management (CNSM). IEEE, pp 1–7
Jiang ZM, Hassan AE, Flora P, Hamann G (2008) Abstracting execution logs to execution events for enterprise applications (short paper). In: 2008 The eighth international conference on quality software. IEEE, pp 181–186
Dai H, Li H, Shang W, Chen T-H, Chen C-S (2020) Logram: efficient log parsing using n-gram dictionaries. arXiv preprint arXiv:2001.03038
Nedelkoski S, Bogatinovski J, Acker A, Cardoso J, Kao O (2020) Self-supervised log parsing. arXiv preprint arXiv:2003.07905
He P, Zhu J, Xu P, Zheng Z, Lyu MR (2018) A directed acyclic graph approach to online log parsing
He P, Zhu J, He S, Li J, Lyu MR (2017) Towards automated log parsing for large-scale log data analysis. IEEE Trans Dependable Secur Comput 15(6):931–944
Article Google Scholar
Agrawal A, Karlupia R, Gupta R (2019) Logan: a distributed online log parser. In: 2019 IEEE 35th international conference on data engineering (ICDE). IEEE, pp 1946–1951
Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput Surv (CSUR) 54(2):1–38
Article Google Scholar
Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles, pp 117–132
Lou J-G, Fu Q, Yang S, Xu Y, Li J (2010) Mining invariants from console logs for system problem detection. In: USENIX annual technical conference, pp 1–14
Zhang X, Xu Y, Lin Q, Qiao B, Zhang H, Dang Y, Xie C, Yang X, Cheng Q, Li Z et al (2019) Robust log-based anomaly detection on unstable log data. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 807–817
Nedelkoski S, Bogatinovski J, Acker A, Cardoso J, Kao O (2020) Self-attentive classification-based anomaly detection in unstructured logs. In: 2020 IEEE international conference on data mining (ICDM), pp 1196–1201. https://doi.org/10.1109/ICDM50108.2020.00148
Kimura T, Watanabe A, Toyono T, Ishibashi K (2018) Proactive failure detection learning generation patterns of large-scale network logs. IEICE Trans Commun
Lu S, Rao B, Wei X, Tak B, Wang L, Wang L (2017) Log-based abnormal task detection and root cause analysis for spark. In: 2017 IEEE international conference on web services (ICWS). IEEE, pp 389–396
Anitha V, Isakki P (2016) A survey on predicting user behavior based on web server log files in a web usage mining. In: 2016 International conference on computing technologies and intelligent data engineering (ICCTIDE’16), pp 1–4. https://doi.org/10.1109/ICCTIDE.2016.7725340
Awad M, Menascé DA (2015) Automatic workload characterization using system log analysis. In: Computer measurement group conference on performance and capacity, San Antonio, TX
He P, Zhu J, He S, Li J, Lyu MR (2016) An evaluation study on log parsing and its use in log mining. In: 2016 46th annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE, pp 654–661
He S, Zhu J, He P, Lyu MR (2020) Loghub: a large collection of system log datasets towards automated log analytics. arXiv preprint arXiv:2008.06448
Ghomi EJ, Rahmani AM, Qader NN (2017) Load-balancing algorithms in cloud computing: a survey. J Netw Comput Appl 88:50–71
Article Google Scholar
Mishra SK, Sahoo B, Parida PP (2020) Load balancing in cloud computing: a big picture. J King Saud Univ Comput Inf Sci 32(2):149–158
Google Scholar

Download references

Acknowledgements

The work described in this paper was supported by the cloud provider 3DS OUSTCALE and by the French National Research and Technology Association (CIFRE program N\(^{\circ }\) 2020/0289). We warmly thank both of them for their support.

Author information

Authors and Affiliations

3DS OUTSCALE, 92210, Saint-Cloud, France
Arthur Vervaet, Mar Callau-Zori & Raja Chiky
ISEP - Institut Supérieur d’Electronique de Paris, 92130, Issy les Moulineaux, France
Arthur Vervaet & Yousra Chabchoub

Authors

Arthur Vervaet
View author publications
You can also search for this author in PubMed Google Scholar
Mar Callau-Zori
View author publications
You can also search for this author in PubMed Google Scholar
Yousra Chabchoub
View author publications
You can also search for this author in PubMed Google Scholar
Raja Chiky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arthur Vervaet.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Some preliminary results have been published at the IEEE International Conference on Data Mining in 2021 [1].

Appendix A: Details about parsing experimental settings

Preprocessing using regex helps log parsers achieve more accurate results. During the evaluation, we selected the same regex for all the parsers on a given dataset. For every algorithm, the parameter setting values were fine-tuned through over 100 runs to avoid bias from randomization. We kept the values for which algorithms achieve the highest accuracy on a given dataset. Therefore, preprocessing regex for each dataset and parameters for each parser are summarized in Table 5).

Regarding the number of parameters, SHISO requires four: (1) maxChild the maximum number of children for each internal node; (2) mergeThreshold, a threshold for searching the most similar template in the children; (3) formatLookupThreshold, lower bound to find the most similar node to adjust; and (4)superFormatThreshold, the threshold of average LCS length to determine if the creation of a super format is needed. LenMa uses only one parameter \(T_c\), the threshold for similarity comparisons between the log message and the clusters. Spell also requires only one parameter \(\tau \) as a threshold for similarity. Finally, Drain needs three parameters [14]: (1) depth, the depth of the parsing tree; (2) st a threshold for similarity comparisons between the log messages and the discovered templates; and (3) maxChild, the maximum number of children that a node can have. Once this threshold is reached, every new value is sent to a default node. In the last version [25], the number of parameters was reduced to only one, st, and a dynamic update is proposed (Table 6).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vervaet, A., Callau-Zori, M., Chabchoub, Y. et al. Online log parsing using evolving research tree. Knowl Inf Syst 66, 1231–1255 (2024). https://doi.org/10.1007/s10115-023-01953-z

Download citation

Received: 05 January 2022
Revised: 13 July 2023
Accepted: 22 July 2023
Published: 03 October 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s10115-023-01953-z

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

PLQ: An Efficient Approach to Processing Pattern-Based Log Queries

Studying and detecting log-related issues

Slop: Towards an Efficient and Universal Streaming Log Parser

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Details about parsing experimental settings

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Online log parsing using evolving research tree

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

PLQ: An Efficient Approach to Processing Pattern-Based Log Queries

Studying and detecting log-related issues

Slop: Towards an Efficient and Universal Streaming Log Parser

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Details about parsing experimental settings

Appendix A: Details about parsing experimental settings

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation