Performance issue monitoring, identification and diagnosis of SaaS software: a survey

Rui Wang¹,
Xiangbo Tian² &
Shi Ying²

37 Accesses
Explore all metrics

Abstract

SaaS (Software-as-a-Service) is a service model provided by cloud computing. It has a high requirement for QoS (Quality of Software) due to its method of providing software service. However, manual identification and diagnosis for performance issues is typically expensive and laborious because of the complexity of the application software and the dynamic nature of the deployment environment. Recently, substantial research efforts have been devoted to automatically identifying and diagnosing performance issues of SaaS software. In this survey, we comprehensively review the different methods about automatically identifying and diagnosing performance issues of SaaS software. We divide them into three steps according to their function: performance log generation, performance issue identification and performance issue diagnosis. We then comprehensively review these methods by their development history. Meanwhile, we give our proposed solution for each step. Finally, the effectiveness of our proposed methods is shown by experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence techniques and tools for performance testing & monitoring of server-less computing

Article 22 April 2024

SaaS CloudQual: A Quality Model for Evaluating Software as a Service on the Cloud Computing Environment

Adaptive Strategies Metric Suite

Data availability statement The data that support the findings of this study are available from the National Disaster Reduction Center of China but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the National Disaster Reduction Center of China.

References

Chen Z, Kim M, Cui Y. SaaS application mashup based on high speed message processing. KSII Transactions on Internet and Information Systems (TIIS), 2022, 16(5): 1446–1465
Google Scholar
De León Guillén M Á D, Morales-Rocha V, Fernández Martínez L F. A systematic review of security threats and countermeasures in SaaS. Journal of Computer Security, 2020, 28(6): 635–653
Article Google Scholar
Soni D, Kumar N. Machine learning techniques in emerging cloud computing integrated paradigms: a survey and taxonomy. Journal of Network and Computer Applications, 2022, 205: 103419
Article Google Scholar
Li W, Zhang Y, Guo Z, Liu L. Study on SaaS cloud service development for telecom operators. Telecommunications Science, 2012, 28(1): 132–136
Google Scholar
Ju J, Wang Y, Fu J, Wu J, Lin Z. Research on key technology in SaaS. In: Proceedings of 2010 International Conference on Intelligent Computing and Cognitive Informatics. 2010, 384–387
Google Scholar
O’Dywer R, Neville S W. Assessing QoS consistency in cloud-based software-as-a-service deployments. In: Proceedings of 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM). 2017, 1–6
Google Scholar
He Q, Han J, Yang Y, Grundy J, Jin H. QoS-driven service selection for multi-tenant SaaS. In: Proceedings of the 5th IEEE International Conference on Cloud Computing. 2012, 566–573
Google Scholar
Varshney S, Sandhu R, Gupta P K. QoS based resource provisioning in cloud computing environment: a technical survey. In: Proceedings of the 3rd International Conference on Advances in Computing and Data Sciences. 2019, 711–723
Chapter Google Scholar
Park J, Jeong H Y. The QoS-based MCDM system for SaaS ERP applications with social network. The Journal of Supercomputing, 2013, 66(2): 614–632
Article Google Scholar
Luo H, Shyu M L. Quality of service provision in mobile multimedia-a survey. Human-centric Computing and Information Sciences, 2011, 1(1): 5
Article Google Scholar
Thain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurrency and Computation: Practice and Experience, 2005, 17(2–4): 323–356
Article Google Scholar
Berman F, Fox G, Hey A J G. Grid Computing: Making the Global Infrastructure A Reality. New York: John Wiley & Sons, 2003
Book Google Scholar
Gao J, Pattabhiraman P, Bai X, Tsai W T. SaaS performance and scalability evaluation in clouds. In: Proceedings of the 6th International Symposium on Service Oriented System (SOSE). 2011, 61–71
Google Scholar
Wang R, Ying S. SaaS software performance issue identification using HMRF-MAP framework. Software: Practice and Experience, 2018, 48(11): 2000–2018
Google Scholar
Munshi M, Shrimali T, Gaur S. A review of enhancing online learning using graph-based data mining techniques. Soft Computing, 2022, 26(12): 5539–5552
Article Google Scholar
Batool I, Khan T A. Software fault prediction using data mining, machine learning and deep learning techniques: a systematic literature review. Computers and Electrical Engineering, 2022, 100: 107886
Article Google Scholar
El-Masri D, Petrillo F, Guéhéneuc Y G, Hamou-Lhadj A, Bouziane A. A systematic literature review on automated log abstraction techniques. Information and Software Technology, 2020, 122: 106276
Article Google Scholar
Zhong Y, Guo Y, Liu C. FLP: a feature-based method for log parsing. Electronics Letters, 2018, 54(23): 1334–1336
Article Google Scholar
Zhang C, Meng X. Log parser with one-to-one markup. In: Proceedings of the 3rd International Conference on Information and Computer Technologies (ICICT). 2020, 251–257
Google Scholar
Fang L, Di X, Liu X, Qin Y, Ren W, Ding Q. QuickLogS: a quick log parsing algorithm based on template similarity. In: Proceedings of the 20th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). 2021, 1085–1092
Google Scholar
Zeng L, Xiao Y, Chen H, Sun B, Han W. Computer operating system logging and security issues: a survey. Security and Communication Networks, 2016, 9(17): 4804–4821
Article Google Scholar
Chen B, Jiang Z M. A survey of software log instrumentation. ACM Computing Surveys, 2022, 54(4): 90
Article Google Scholar
Behera A, Panigrahi C R, Pati B. Unstructured Log Analysis for System Anomaly Detection—A Study. Advances in Data Science and Management: Proceedings of ICDSM 2021. Singapore: Springer Nature Singapore, 2022, 497–509, 149–158
Chapter Google Scholar
Fu Q, Lou J G, Lin Q, Ding R, Zhang D, Xie T. Contextual analysis of program logs for understanding system behaviors. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). 2013, 397–400
Google Scholar
Clayman S, Galis A, Mamatas L. Monitoring virtual networks with lattice. In: Proceedings of 2010 IEEE/IFIP Network Operations and Management Symposium Workshops. 2010, 239–246
Chapter Google Scholar
Yao K, De Padua G B, Shang W, Sporea C, Toma A, Sajedi S. Log4perf: Suggesting and updating logging locations for web-based systems’ performance monitoring. Empirical Software Engineering, 2020, 25(1): 488–531
Article Google Scholar
Rong G, Zhang Q, Liu X, Gu S. A systematic review of logging practice in software engineering. In: Proceedings of the 24th Asia-Pacific Software Engineering Conference (APSEC). 2017, 534–539
Google Scholar
He S, He P, Chen Z, Yang T, Su Y, Lyu M R. A survey on automated log analysis for reliability engineering. ACM Computing Surveys, 2022, 54(6): 130
Article Google Scholar
Gujral H, Lal S, Li H. An exploratory semantic analysis of logging questions. Journal of Software: Evolution and Process, 2021, 33(7): e2361
Google Scholar
Schwarz C. Ldagibbs: A command for topic modeling in Stata using latent dirichlet allocation. The Stata Journal: Promoting Communications on Statistics and Stata, 2018, 18(1): 101–117
Article Google Scholar
Joung J, Kim H M. Automated keyword filtering in latent dirichlet allocation for identifying product attributes from online reviews. Journal of Mechanical Design, 2021, 143(8): 084501
Article Google Scholar
Li H, Liu J, Zhang S. Hierarchical latent dirichlet allocation models for realistic action recognition. In: Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2011, 1297–1300
Google Scholar
Fu J, Liu N, Hu C, Zhang X. Hot topic classification of microblogging based on cascaded latent dirichlet allocation. ICIC Express Letters, Part B: Applications, 2016, 7(3): 621–625
Google Scholar
Wu J, Son G, Wang S. A competency mining method based on latent dirichlet allocation (LDA) model. Journal of Physics: Conference Series, 2020, 1682: 012059
Google Scholar
Liu Y, Jin Z. A text classification model constructed by latent dirichlet allocation and deep learning. In: Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. 2015
Google Scholar
Rus V, Niraula N, Banjade R. Similarity measures based on latent dirichlet allocation. In: Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing. 2013, 459–470
Chapter Google Scholar
Yuan D, Park S, Huang P, Liu Y, Lee M M, Tang X, Zhou Y, Savage S. Be conservative: enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 293–306
Google Scholar
Fu Q, Zhu J, Hu W, Lou J G, Ding R, Lin Q, Zhang D, Xie T. Where do developers log? An empirical study on logging practices in industry. In: Proceedings of the 36th International Conference on Software Engineering. 2014, 24–33
Google Scholar
Zhu J, He P, Fu Q, Zhang H, Lyu M R, Zhang D. Learning to log: helping developers make informed logging decisions. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering. 2015, 415–425
Google Scholar
Li Z. Studying and suggesting logging locations in code blocks. In: Proceedings of the 42nd ACM/IEEE International Conference on Software Engineering: Companion Proceedings. 2020, 125–127
Chapter Google Scholar
Gholamian S. Leveraging code clones and natural language processing for log statement prediction. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021, 1043–1047
Google Scholar
Cinque M, Cotroneo D, Pecchia A. Event logs for the analysis of software failures: a rule-based approach. IEEE Transactions on Software Engineering, 2013, 39(6): 806–821
Article Google Scholar
Li S, Niu X, Jia Z, Liao X, Wang J, Li T. Guiding log revisions by learning from software evolution history. Empirical Software Engineering, 2020, 25(3): 2302–2340
Article Google Scholar
Zhang H, Tang Y, Lamothe M, Li H, Shang W. Studying logging practice in test code. Empirical Software Engineering, 2022, 27(4): 83
Article Google Scholar
Zadrozny P, Kodali R. Big Data Analytics Using Splunk: Deriving Operational Intelligence from Social Media, Machine Data, Existing Data Warehouses, and Other Real-Time Streaming Sources. Berkeley: Apress, 2013
Book Google Scholar
Patel H A, Meniya A D. A survey on commercial and open source cloud monitoring. International Journal of Science and Modern Engineering (IJISME), 2013, 1(2): 42–44
Google Scholar
George L. HBase: The Definitive Guide: Random Access to Your Planet-Size Data. Sebastopol: O’Reilly Media, Inc., 2011
Google Scholar
Serrano D, Han D, Stroulia E. From relations to multi-dimensional maps: towards an SQL-to-HBase transformation methodology. In: Proceedings of the 8th IEEE International Conference on Cloud Computing. 2015, 81–89
Google Scholar
Bhupathiraju V, Ravuri R P. The dawn of big data-Hbase. In: Proceedings of 2014 Conference on IT in Business, Industry and Government (CSIBIG). 2014, 1–4
Google Scholar
Saloustros G, Magoutis K. Rethinking Hbase: design and implementation of an elastic key-value store over log-structured local volumes. In: Proceedings the 14th International Symposium on Parallel and Distributed Computing. 2015, 225–234
Google Scholar
Zhang C, Liu X. HBaseMQ: a distributed message queuing system on clouds with HBase. In: Proceedings of 2013 Proceedings IEEE INFOCOM. 2013, 40–44
Chapter Google Scholar
Hou Y, Yuan S, Xu W, Wei D. Transformation of an E-R model into HBase tables: a data store design for IHE-XDS document registry. In: Proceedings of the 12th IEEE International Conference on Ubiquitous Intelligence and Computing and the 12th IEEE International Conference on Autonomic and Trusted Computing and the 15th IEEE International Conference on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom). 2015, 1809–1812
Google Scholar
Bao X, Liu L, Xiao N, Liu F, Zhang Q, Zhu T. HConfig: resource adaptive fast bulk loading in HBase. In: Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. 2014, 215–224
Google Scholar
Giblin C, Rooney S, Vetsch P, Preston A. Securing Kafka with encryption-at-rest. In: Proceedings of 2021 IEEE International Conference on Big Data (Big Data). 2021, 5378–5387
Google Scholar
Wang Z, Dai W, Wang F, Deng H, Wei S, Zhang X, Liang B. Kafka and its using in high-throughput and reliable message distribution. In: Proceedings of the 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS). 2015, 117–120
Google Scholar
Wu H. Research proposal: reliability evaluation of the apache Kafka streaming system. In: Proceedings of 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 2019, 112–113
Chapter Google Scholar
Zhang H, Fang L, Jiang K, Zhang W, Li M, Zhou L. Secure door on cloud: a secure data transmission scheme to protect Kafka’s data. In: Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems (ICPADS). 2020, 406–413
Google Scholar
Tsai W, Bai X, Huang Y. Software-as-a-service (SaaS): perspectives and challenges. Science China Information Sciences, 2014, 57(5): 1–15
Article Google Scholar
Liu D, Pei D, Zhao Y. Application-aware latency monitoring for cloud tenants via CloudWatch+. In: Proceedings of the 10th International Conference on Network and Service Management (CNSM) and Workshop. 2014, 73–81
Chapter Google Scholar
Stephen A, Benedict S, Kumar R P A. Monitoring IaaS using various cloud monitors. Cluster Computing, 2019, 22(5): 12459–12471
Article Google Scholar
Da Silva Rocha É, Da Silva L G F, Santos G L, Bezerra D, Moreira A, Gonçalves G, Marquezini M V, Mehta A, Wildeman M, Kelner J, Sadok D, Endo P T. Aggregating data center measurements for availability analysis. Software: Practice and Experience, 2021, 51(5): 868–892
Google Scholar
Tasquier L, Venticinque S, Aversa R, Di Martino B. Agent based application tools for cloud provisioning and management. In: Proceedings of the 3rd International Conference on Cloud Computing. 2012, 32–42
Google Scholar
De Chaves S A, Uriarte R B, Westphall C B. Toward an architecture for monitoring private clouds. IEEE Communications Magazine, 2011, 49(12): 130–137
Article Google Scholar
Massie M L, Chun B N, Culler D E. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 2004, 30(7): 817–840
Article Google Scholar
Nagios X. The industry standard in it infrastructure monitoring. See Logon-int.com/nagios/ website, 2011
Google Scholar
Mardiyono A, Sholihah W, Hakim F. Mobile-based network monitoring system using Zabbix and telegram. In: Proceedings of the 3rd International Conference on Computer and Informatics Engineering (IC2IE). 2020, 473–477
Google Scholar
Andreozzi S, De Bortoli N, Fantinel S, Ghiselli A, Rubini G L, Tortone G, Vistoli M C. GridiCE: a monitoring service for grid systems. Future Generation Computer Systems, 2005, 21(4): 559–571
Article Google Scholar
König B, Calero J M A, Kirschnick J. Elastic monitoring framework for cloud infrastructures. IET Communications, 2012, 6(10): 1306–1315
Article Google Scholar
Povedano-Molina J, Lopez-Vega J M, Lopez-Soler J M, Corradi A, Foschini L. DARGOS: a highly adaptable and scalable monitoring architecture for multi-tenant clouds. Future Generation Computer Systems, 2013, 29(8): 2041–2056
Article Google Scholar
Meng S, Liu L. Enhanced monitoring-as-a-service for effective cloud management. IEEE Transactions on Computers, 2013, 62(9): 1705–1720
Article MathSciNet Google Scholar
Calero J M A, Aguado J G. MonPaaS: An adaptive monitoring platformas a service for cloud computing infrastructures and services. IEEE Transactions on Services Computing, 2015, 8(1): 65–78
Article Google Scholar
Alhamazani K, Ranjan R, Jayaraman P P, Mitra K, Liu C, Rabhi F, Georgakopoulos D, Wang L. Cross-layer multi-cloud real-time application QoS monitoring and benchmarking as-a-service framework. IEEE Transactions on Cloud Computing, 2019, 7(1): 48–61
Article Google Scholar
Wang H, Zhang X, Ma Z, Li L, Gao J. An microservices-based openstack monitoring system. In: Proceedings of the 11th International Conference on Educational and Information Technology (ICEIT). 2022, 232–236
Google Scholar
Badshah A, Jalal A, Farooq U, Rehman G U, Band S S, Iwendi C. Service level agreement monitoring as a service: an independent monitoring service for service level agreements in clouds. Big Data, 2023, 11(5): 339–354
Article Google Scholar
Mezni H, Sellami M, Aridhi S, Charrada F B. Towards big services: a synergy between service computing and parallel programming. Computing, 2021, 103(11): 2479–2519
Article Google Scholar
Mezni H. Web service adaptation: a decade’s overview. Computer Science Review, 2023, 48: 100535
Article Google Scholar
Kumar R, Jain K, Maharwal H, Jain N, Dadhich A. Apache CloudStack: open source infrastructure as a service cloud computing platform. International Journal of Advancement in Engineering Technology, Management & Applied Science, 2014, 1(2): 111–116
Google Scholar
Schwartz B, Zaitsev P, Tkachenko V. High Performance MySQL: Optimization, Backups, and Replication. Sebastopol: O’Reilly Media, Inc., 2012
Google Scholar
Sun W, Zhang X, Guo C J, Sun P, Su H. Software as a service: configuration and customization perspectives. In: Proceedings of 2008 IEEE Congress on Services Part II (Services-2 2008). 2008, 18–25
Chapter Google Scholar
Lan Z, Zheng Z, Li Y. Toward automated anomaly identification in large-scale systems. IEEE Transactions on Parallel and Distributed Systems, 2010, 21(2): 174–187
Article Google Scholar
Yu L, Lan Z. A scalable, non-parametric method for detecting performance anomaly in large scale computing. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(7): 1902–1914
Article Google Scholar
Odyurt U, Meyer H, Pimentel A D, Paradas E, Alonso I G. Software passports for automated performance anomaly detection of cyberphysical systems. In: Proceedings of the 19th International Conference on Embedded Computer Systems. 2019, 255–268
Google Scholar
Wang R, Ying S. SaaS software performance issue diagnosis using independent component analysis and restricted boltzmann machine. Concurrency and Computation: Practice and Experience, 2020, 32(14): e5729
Article Google Scholar
Zhao N, Han B, Cai Y, Su J. SeqAD: an unsupervised and sequential autoencoder ensembles based anomaly detection framework for KPI. In: Proceedings of the 29th IEEE/ACM International Symposium on Quality of Service (IWQOS). 2021, 1–6
Google Scholar
Chaturvedi A. Method and system for near real time reduction of insignificant key performance indicator data in a heterogeneous radio access and core network. In: Proceedings of 2020 IEEE Wireless Communications and Networking Conference Workshops (WCNCW). 2020, 1–7
Google Scholar
Kusrini E, Safitri K N, Fole A. Design key performance indicator for distribution sustainable supply chain management. In: Proceedings of 2020 International Conference on Decision Aid Sciences and Application (DASA). 2020, 738–744
Chapter Google Scholar
Hinderks A, Schrepp M, Mayo F J D, Escalona M J, Thomaschewski J. Developing a UX KPI based on the user experience questionnaire. Computer Standards & Interfaces, 2019, 65: 38–44
Article Google Scholar
Fotrousi F, Fricker S A, Fiedler M, Le-Gall F. KPIs for software ecosystems: a systematic mapping study. In: Proceedings of the 5th International Conference of Software Business. 2014, 194–211
Google Scholar
Zhang S, Zhao C, Sui Y, Su Y, Sun Y, Zhang Y, Pei D, Wang Y. Robust KPI anomaly detection for large-scale software services with partial labels. In: Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering (ISSRE). 2021, 103–114
Google Scholar
Jiang Y, Haihong E, Song M, Zhang K. Research and application of newborn defects prediction based on spark and PU-learning. In: Proceedings of the 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). 2018, 657–663
Google Scholar
Shu S, Lin Z, Yan Y, Li L. Learning from multi-class positive and unlabeled data. In: Proceedings of 2020 IEEE International Conference on Data Mining (ICDM). 2020, 1256–1261
Chapter Google Scholar
Chen X, Chen W, Chen T, Yuan Y, Gong C, Chen K, Wang Z. Self-PU: Self boosted and calibrated positive-unlabeled training. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 141
Google Scholar
Han K, Chen W, Xu M. Investigating active positive-unlabeled learning with deep networks. In: Proceedings of the 34th Australasian Joint Conference on Advances in Artificial Intelligence. 2022, 607–618
Google Scholar
Hu W, Le R, Liu B, Ji F, Ma J, Zhao D, Yan R. Predictive adversarial learning from positive and unlabeled data. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 7806–7814
Google Scholar
Qiu J, Cai X, Zhang X, Cheng F, Yuan S, Fu G. An evolutionary multi-objective approach to learn from positive and unlabeled data. Applied Soft Computing, 2021, 101: 106986
Article Google Scholar
Gong C, Liu T, Yang J, Tao D. Large-margin label-calibrated support vector machines for positive and unlabeled learning. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(11): 3471–3483
Article Google Scholar
He P, Zhu J, Zheng Z, Lyu M R. Drain: an online log parsing approach with fixed depth tree. In: Proceedings of 2017 IEEE International Conference on Web Services (ICWS). 2017, 33–40
Chapter Google Scholar
Plaat A, Schaeffer J, Pijls W, De Bruin A. Best-first fixed-depth minimax algorithms. Artificial Intelligence, 1996, 87(1–2): 255–293
Article MathSciNet Google Scholar
Du M, Li F. Spell: online streaming parsing of large unstructured system logs. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(11): 2213–2227
Article Google Scholar
Wang B, Yang X, Li J. Locating longest common subsequences with limited penalty. In: Proceedings of the 22nd International Conference on Database Systems for Advanced Applications. 2017, 187–201
Chapter Google Scholar
Weems B P, Bai Y. Finding longest common increasing subsequence for two different scenarios of non-random input sequences. In: Proceedings of 2005 International Conference on Foundations of Computer Science. 2005, 64–72
Google Scholar
Meng W, Liu Y, Zaiter F, Zhang S, Chen Y, Zhang Y, Zhu Y, Wang E, Zhang R, Tao S, Yang D, Zhou R, Pei D. LogParse: making log parsing adaptive through word classification. In: Proceedings of the 29th International Conference on Computer Communications and Networks (ICCCN). 2020, 1–9
Google Scholar
Vervaet A, Chiky R, Callau-Zori M. USTEP: unfixed search tree for efficient log parsing. In: Proceedings of 2021 IEEE International Conference on Data Mining (ICDM). 2021, 659–668
Chapter Google Scholar
Chakrabarti A, Striegel A, Manimaran G. A case for tree evolution in QoS multicasting. In: Proceedings of the 10th IEEE International Workshop on Quality of Service (Cat. No.02EX564). 2002, 116–125
Google Scholar
Li K. A random-walk-based dynamic tree evolution algorithm with exponential speed of convergence to optimality on regular networks. In: Proceedings of the 4th International Conference on Frontier of Computer Science and Technology. 2009, 80–85
Google Scholar
Tomer A, Schach S R. The evolution tree: a maintenance-oriented software development model. In: Proceedings of the 4th European Conference on Software Maintenance and Reengineering. 2000, 209–214
Chapter Google Scholar
Dai H, Li H, Chen C S, Shang W, Chen T H. Logram: efficient log parsing using n-gram dictionaries. IEEE Transactions on Software Engineering, 2022, 48(3): 879–892
Google Scholar
Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 149–158
Google Scholar
Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In: Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles. 2009, 117–132
Chapter Google Scholar
Nedelkoski S, Cardoso J, Kao O. Anomaly detection from system tracing data using multimodal deep learning. In: Proceedings of the 12th IEEE International Conference on Cloud Computing (CLOUD). 2019, 179–186
Google Scholar
Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K. TadGAN: Time series anomaly detection using generative adversarial networks. In: Proceedings of 2020 IEEE International Conference on Big Data (Big Data). 2020, 33–43
Google Scholar
Luo W, Wang P, Wang J, An W. The research process of generative adversarial networks. Journal of Physics: Conference Series, 2019, 1176(3): 032008
Google Scholar
Tran N T, Tran V H, Nguyen N B, Nguyen T K, Cheung N M. On data augmentation for GAN training. IEEE Transactions on Image Processing, 2021, 30: 1882–1897
Article MathSciNet Google Scholar
Liu Z, Sabar N, Song A. Improving evolutionary generative adversarial networks. In: Proceedings of the 34th Australasian Joint Conference on Artificial Intelligence. 2022, 691–702
Google Scholar
Sinha R, Sankaran A, Vatsa M, Singh R. AuthorGAN: Improving GAN reproducibility using a modular GAN framework. 2019, arXiv preprint arXiv: 1911.13250
Google Scholar
Xia W, Zhang Y, Yang Y, Xue J H, Zhou B, Yang M H. GAN inversion: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3121–3138
Google Scholar
Wang X, Cao Q, Wang Q, Cao Z, Zhang X, Wang P. Robust log anomaly detection based on contrastive learning and multi-scale mass. The Journal of Supercomputing, 2022, 78(16): 17491–17512
Article Google Scholar
Zhang Z, Wu S, Jiang D, Chen G. BERT-JAM: boosting BERT-enhanced neural machine translation with joint attention. 2020, arXiv preprint arXiv: 2011.04266
Google Scholar
Trang N T M, Shcherbakov M. Vietnamese question answering system from multilingual BERT models to monolingual BERT model. In: Proceedings of the 9th International Conference System Modeling and Advancement in Research Trends (SMART). 2020, 201–206
Google Scholar
Shi L, Liu D, Liu G, Meng K. AUG-BERT: an efficient data augmentation algorithm for text classification. In: Proceedings of the 8th International Conference in Communications, Signal Processing, and Systems. 2020, 2191–2198
Chapter Google Scholar
Praechanya N, Sornil O. Improving Thai named entity recognition performance using BERT transformer on deep networks. In: Proceedings of the 6th International Conference on Machine Learning Technologies. 2021, 177–183
Google Scholar
Yang L, Chen J, Wang Z, Wang W, Jiang J, Dong X, Zhang W. Semi-supervised log-based anomaly detection via probabilistic label estimation. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). 2021, 1448–1460
Google Scholar
Farzad A, Gulliver T A. Log message anomaly detection with fuzzy c-means and MLP. Applied Intelligence, 2022, 52(15): 17708–17717
Article Google Scholar
Zhang C, Peng X, Sha C, Zhang K, Fu Z, Wu X, Lin Q, Zhang D. DeepTraLog: Trace-log combined microservice anomaly detection through graph-based deep learning. In: Proceedings of the 44th International Conference on Software Engineering. 2022, 623–634
Chapter Google Scholar
Zhang C, Peng X, Zhou T, Sha C, Yan Z, Chen Y, Yang H. TraceCRL: contrastive representation learning for microservice trace analysis. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022, 1221–1232
Chapter Google Scholar
Aguilera M K, Mogul J C, Wiener J L, Reynolds P, Muthitacharoen A. Performance debugging for distributed systems of black boxes. ACM SIGOPS Operating Systems Review, 2003, 37(5): 74–89
Article Google Scholar
Chen Y Y M. Path-Based Failure and Evolution Management. Berkeley: University of California at Berkeley, 2004
Google Scholar
Barham P, Donnelly A, Isaacs R, Mortier R. Using magpie for request extraction and workload modelling. In: Proceedings of the 6th Symposium on Operating System Design & Implementation. 2004, 18
Google Scholar
Chen H, Jiang G, Ungureanu C, Yoshihira K. Failure detection and localization in component based systems by online tracking. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 750–755
Google Scholar
Lim M H, Lou J G, Zhang H, Fu Q, Teoh A B J, Lin Q, Ding R, Zhang D. Identifying recurrent and unknown performance issues. In: Proceedings of 2014 IEEE International Conference on Data Mining. 2014, 320–329
Chapter Google Scholar
Fischer A, Igel C. Training restricted Boltzmann machines: an introduction. Pattern Recognition, 2014, 47(1): 25–39
Article Google Scholar
Carreira-Perpinan M Á, Hinton G E. On contrastive divergence learning. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. 2005, 33–40
Google Scholar
Liu P, Xu H, Ouyang Q, Jiao R, Chen Z, Zhang S, Yang J, Mo L, Zeng J, Xue W, Pei D. Unsupervised detection of microservice trace anomalies through service-level deep Bayesian networks. In: Proceedings of the 31st IEEE International Symposium on Software Reliability Engineering (ISSRE). 2020, 48–58
Google Scholar
Kohyarnejadfard I, Aloise D, Dagenais M R, Shakeri M. A framework for detecting system performance anomalies using tracing data analysis. Entropy, 2021, 23(8): 1011
Article Google Scholar
Cai Y, Han B, Su J, Wang X. TraceModel: an automatic anomaly detection and root cause localization framework for microservice systems. In: Proceedings of the 17th International Conference on Mobility, Sensing and Networking (MSN). 2021, 512–519
Google Scholar
Li M, Tang D, Wen Z, Cheng Y. Microservice anomaly detection based on tracing data using semi-supervised learning. In: Proceedings of the 4th International Conference on Artificial Intelligence and Big Data (ICAIBD). 2021, 38–44
Google Scholar
Liu J, Hu Y, Wu B, Wang Y, Xie F. A hybrid generalized hidden markov model-based condition monitoring approach for rolling bearings. Sensors, 2017, 17(5): 1143
Article Google Scholar
Wang R, Ying S, Sun C, Wan H, Zhang H, Jia X. Model construction and data management of running log in supporting saas software performance analysis. In: Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering (SEKE 2017). 2017, 149–154
Google Scholar
Fu X, Ren R, Zhan J, Zhou W, Jia Z, Lu G. LogMaster: mining event correlations in logs of large-scale cluster systems. In: Proceedings of the 31st IEEE Symposium on Reliable Distributed Systems. 2012, 71–80
Google Scholar
Zou D, Qin H, Jin H, Qiang W, Han Z, Chen X. Improving log-based fault diagnosis by log classification. In: Proceedings of the 11th IFIP International Conference on Network and Parallel Computing. 2014, 446–458
Google Scholar
Guo X, Peng X, Wang H, Li W, Jiang H, Ding D, Xie T, Su L. Graph-based trace analysis for microservice architecture understanding and problem diagnosis. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020, 1387–1397
Chapter Google Scholar
Huo Y, Dong J, Ge Z, Xie P, An N, Yang Y. IWApriori: an association rule mining and self-updating method based on weighted increment. In: Proceedings of the 21st Asia-Pacific Network Operations and Management Symposium (APNOMS). 2020, 167–172
Google Scholar
Wang L, Zhao N, Chen J, Li P, Zhang W, Sui K. Root-cause metric location for microservice systems via log anomaly detection. In: Proceedings of 2020 IEEE International Conference on Web Services (ICWS). 2020, 142–150
Chapter Google Scholar
Liu D, He C, Peng X, Lin F, Zhang C, Gong S, Li Z, Ou J, Wu Z. MicroHECL: High-efficient root cause localization in large-scale microservice systems. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 2021, 338–347
Google Scholar
Gan Y, Liang M, Dev S, Lo D, Delimitrou C. Sage: practical and scalable ML-driven performance debugging in microservices. In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2021, 135–151
Chapter Google Scholar
Ma M, Lin W, Pan D, Wang P. ServiceRank: root cause identification of anomaly in large-scale microservice architectures. IEEE Transactions on Dependable and Secure Computing, 2022, 19(5): 3087–3100
Article Google Scholar
Li M, Li Z, Yin K, Nie X, Zhang W, Sui K, Pei D. Causal inference-based root cause analysis for online service systems with intervention recognition. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 3230–3240
Chapter Google Scholar
Zhao X, Zhang Y, Lion D, Ullah M F, Luo Y, Yuan D, Stumm M. lprof: a non-intrusive request flow profiler for distributed systems. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 2014, 629–644
Google Scholar
Bare K, Kavulya S P, Tan J, Pan X, Marinelli E, Kasick M, Gandhi R, Narasimhan P. ASDF: an automated, online framework for diagnosing performance problems. In: Casimiro A, Lemos R, Gacek C, eds. Architecting Dependable Systems VII. Berlin: Springer, 2010, 201–226
Chapter Google Scholar
Attariyan M, Chow M, Flinn J. X-ray: automating root-cause diagnosis of performance anomalies in production software. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 307–320
Google Scholar
Malik H, Hemmati H, Hassan A E. Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 35th International Conference on Software Engineering (ICSE). 2013, 1012–1021
Google Scholar
Tuncer O, Ates E, Zhang Y, Turk A, Brandt J, Leung V J, Egele M, Coskun A K. Online diagnosis of performance variation in hpc systems using machine learning. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(4): 883–896
Article Google Scholar
Li M, Tang D, Wen Z, Cheng Y. Universal anomaly detection method based on massive monitoring indicators of cloud platform. In: Proceedings of 2021 IEEE International Conference on Software Engineering and Artifcial Intelligence (SEAI). 2021, 23–29
Google Scholar
Borghesi A, Molan M, Milano M, Bartolini A. Anomaly detection and anticipation in high performance computing systems. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(4): 739–750
Article Google Scholar
Stehman S V. Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 1997, 62(1): 77–89
Article Google Scholar
Powers D M W. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. 2020, arXiv preprint arXiv: 2010.16061
Google Scholar
Hsieh W W. Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels. Cambridge: Cambridge University Press, 2009
Book Google Scholar
Qin J, He Z S. A SVM face recognition method based on Gabor-featured key points. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics. 2005, 5144–5149
Chapter Google Scholar
Hearst M A, Dumais S T, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Systems and Their Applications, 1998, 13(4): 18–28
Article Google Scholar
Rish I. An empirical study of the naive Bayes classifier. In: Proceedings of IJCAI 2001Workshop on Empirical Methods in Artificial Intelligence. 2001, 41–46
Google Scholar
Larose D T, Larose C D. k-nearest neighbor algorithm. In: Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, 2014, 149–164
Chapter Google Scholar
Manning C, Raghavan P, Schütze H. Vector space classification. In: An Introduction to Information Retrieval. 2009, 289–317
Google Scholar
Freedman D A. Statistical Models: Theory and Practice. Cambridge, England: Cambridge University Press, 2009
Book Google Scholar
Lam P, Wang L, Ngan H Y, Yung N H, Yeh A G. Outlier detection in large-scale traffic data by naive Bayes method and Gaussian mixture model method. 2015, arXiv preprint arXiv: 1512.08413
Google Scholar
Loh W Y. Classification and regression trees. WIREs: Data Mining and Knowledge Discovery, 2011, 1(1): 14–23
Google Scholar
Freund Y, Schapire R E. A desicion-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the 2nd European Conference on Computational Learning Theory. 1995, 23–37
Chapter Google Scholar
Phillips S J, Anderson R P, Schapire R E. Maximum entropy modeling of species geographic distributions. Ecological Modelling, 2006, 190(3–4): 231–259
Article Google Scholar

Download references

Acknowledgments

The work was supported by the National Key R&D Program of China (2022YFB3304300), the Humanities and Social Sciences Youth Foundation, Ministry of Education (23YJCZH221) and the Natural Science Foundation of Shandong Province (ZR2023QE030).

Author information

Authors and Affiliations

College of Mining and Safety Engineering, Shandong University of Science and Technology, Qingdao, 266590, China
Rui Wang
School of Computer Science, Wuhan University, Wuhan, 430072, China
Xiangbo Tian & Shi Ying

Authors

Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangbo Tian
View author publications
You can also search for this author in PubMed Google Scholar
Shi Ying
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shi Ying.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Rui Wang is currently a lecturer with the College of Energy and Mining Engineering, Shandong University of Science and Technology, China. She received the PhD degree from Wuhan University, China in 2019. Her research interests include software engineering and data mining.

Xiangbo Tian received the BE degree from the Shandong University of Science and Technology, China in 2018. He is currently working toward the PhD degree with the School of Computer Science, Wuhan University, China. His current research interests include service computing and microservices.

Shi Ying is currently a professor in the School of Computer Science, Wuhan University, China. His main research interests include software engineering and artificial intelligence.

Electronic Supplementary Material