Abstract
SaaS (Software-as-a-Service) is a service model provided by cloud computing. It has a high requirement for QoS (Quality of Software) due to its method of providing software service. However, manual identification and diagnosis for performance issues is typically expensive and laborious because of the complexity of the application software and the dynamic nature of the deployment environment. Recently, substantial research efforts have been devoted to automatically identifying and diagnosing performance issues of SaaS software. In this survey, we comprehensively review the different methods about automatically identifying and diagnosing performance issues of SaaS software. We divide them into three steps according to their function: performance log generation, performance issue identification and performance issue diagnosis. We then comprehensively review these methods by their development history. Meanwhile, we give our proposed solution for each step. Finally, the effectiveness of our proposed methods is shown by experiments.
Similar content being viewed by others
Data availability statement The data that support the findings of this study are available from the National Disaster Reduction Center of China but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the National Disaster Reduction Center of China.
References
Chen Z, Kim M, Cui Y. SaaS application mashup based on high speed message processing. KSII Transactions on Internet and Information Systems (TIIS), 2022, 16(5): 1446–1465
De León Guillén M Á D, Morales-Rocha V, Fernández Martínez L F. A systematic review of security threats and countermeasures in SaaS. Journal of Computer Security, 2020, 28(6): 635–653
Soni D, Kumar N. Machine learning techniques in emerging cloud computing integrated paradigms: a survey and taxonomy. Journal of Network and Computer Applications, 2022, 205: 103419
Li W, Zhang Y, Guo Z, Liu L. Study on SaaS cloud service development for telecom operators. Telecommunications Science, 2012, 28(1): 132–136
Ju J, Wang Y, Fu J, Wu J, Lin Z. Research on key technology in SaaS. In: Proceedings of 2010 International Conference on Intelligent Computing and Cognitive Informatics. 2010, 384–387
O’Dywer R, Neville S W. Assessing QoS consistency in cloud-based software-as-a-service deployments. In: Proceedings of 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM). 2017, 1–6
He Q, Han J, Yang Y, Grundy J, Jin H. QoS-driven service selection for multi-tenant SaaS. In: Proceedings of the 5th IEEE International Conference on Cloud Computing. 2012, 566–573
Varshney S, Sandhu R, Gupta P K. QoS based resource provisioning in cloud computing environment: a technical survey. In: Proceedings of the 3rd International Conference on Advances in Computing and Data Sciences. 2019, 711–723
Park J, Jeong H Y. The QoS-based MCDM system for SaaS ERP applications with social network. The Journal of Supercomputing, 2013, 66(2): 614–632
Luo H, Shyu M L. Quality of service provision in mobile multimedia-a survey. Human-centric Computing and Information Sciences, 2011, 1(1): 5
Thain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurrency and Computation: Practice and Experience, 2005, 17(2–4): 323–356
Berman F, Fox G, Hey A J G. Grid Computing: Making the Global Infrastructure A Reality. New York: John Wiley & Sons, 2003
Gao J, Pattabhiraman P, Bai X, Tsai W T. SaaS performance and scalability evaluation in clouds. In: Proceedings of the 6th International Symposium on Service Oriented System (SOSE). 2011, 61–71
Wang R, Ying S. SaaS software performance issue identification using HMRF-MAP framework. Software: Practice and Experience, 2018, 48(11): 2000–2018
Munshi M, Shrimali T, Gaur S. A review of enhancing online learning using graph-based data mining techniques. Soft Computing, 2022, 26(12): 5539–5552
Batool I, Khan T A. Software fault prediction using data mining, machine learning and deep learning techniques: a systematic literature review. Computers and Electrical Engineering, 2022, 100: 107886
El-Masri D, Petrillo F, Guéhéneuc Y G, Hamou-Lhadj A, Bouziane A. A systematic literature review on automated log abstraction techniques. Information and Software Technology, 2020, 122: 106276
Zhong Y, Guo Y, Liu C. FLP: a feature-based method for log parsing. Electronics Letters, 2018, 54(23): 1334–1336
Zhang C, Meng X. Log parser with one-to-one markup. In: Proceedings of the 3rd International Conference on Information and Computer Technologies (ICICT). 2020, 251–257
Fang L, Di X, Liu X, Qin Y, Ren W, Ding Q. QuickLogS: a quick log parsing algorithm based on template similarity. In: Proceedings of the 20th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). 2021, 1085–1092
Zeng L, Xiao Y, Chen H, Sun B, Han W. Computer operating system logging and security issues: a survey. Security and Communication Networks, 2016, 9(17): 4804–4821
Chen B, Jiang Z M. A survey of software log instrumentation. ACM Computing Surveys, 2022, 54(4): 90
Behera A, Panigrahi C R, Pati B. Unstructured Log Analysis for System Anomaly Detection—A Study. Advances in Data Science and Management: Proceedings of ICDSM 2021. Singapore: Springer Nature Singapore, 2022, 497–509, 149–158
Fu Q, Lou J G, Lin Q, Ding R, Zhang D, Xie T. Contextual analysis of program logs for understanding system behaviors. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). 2013, 397–400
Clayman S, Galis A, Mamatas L. Monitoring virtual networks with lattice. In: Proceedings of 2010 IEEE/IFIP Network Operations and Management Symposium Workshops. 2010, 239–246
Yao K, De Padua G B, Shang W, Sporea C, Toma A, Sajedi S. Log4perf: Suggesting and updating logging locations for web-based systems’ performance monitoring. Empirical Software Engineering, 2020, 25(1): 488–531
Rong G, Zhang Q, Liu X, Gu S. A systematic review of logging practice in software engineering. In: Proceedings of the 24th Asia-Pacific Software Engineering Conference (APSEC). 2017, 534–539
He S, He P, Chen Z, Yang T, Su Y, Lyu M R. A survey on automated log analysis for reliability engineering. ACM Computing Surveys, 2022, 54(6): 130
Gujral H, Lal S, Li H. An exploratory semantic analysis of logging questions. Journal of Software: Evolution and Process, 2021, 33(7): e2361
Schwarz C. Ldagibbs: A command for topic modeling in Stata using latent dirichlet allocation. The Stata Journal: Promoting Communications on Statistics and Stata, 2018, 18(1): 101–117
Joung J, Kim H M. Automated keyword filtering in latent dirichlet allocation for identifying product attributes from online reviews. Journal of Mechanical Design, 2021, 143(8): 084501
Li H, Liu J, Zhang S. Hierarchical latent dirichlet allocation models for realistic action recognition. In: Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2011, 1297–1300
Fu J, Liu N, Hu C, Zhang X. Hot topic classification of microblogging based on cascaded latent dirichlet allocation. ICIC Express Letters, Part B: Applications, 2016, 7(3): 621–625
Wu J, Son G, Wang S. A competency mining method based on latent dirichlet allocation (LDA) model. Journal of Physics: Conference Series, 2020, 1682: 012059
Liu Y, Jin Z. A text classification model constructed by latent dirichlet allocation and deep learning. In: Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. 2015
Rus V, Niraula N, Banjade R. Similarity measures based on latent dirichlet allocation. In: Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing. 2013, 459–470
Yuan D, Park S, Huang P, Liu Y, Lee M M, Tang X, Zhou Y, Savage S. Be conservative: enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 293–306
Fu Q, Zhu J, Hu W, Lou J G, Ding R, Lin Q, Zhang D, Xie T. Where do developers log? An empirical study on logging practices in industry. In: Proceedings of the 36th International Conference on Software Engineering. 2014, 24–33
Zhu J, He P, Fu Q, Zhang H, Lyu M R, Zhang D. Learning to log: helping developers make informed logging decisions. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering. 2015, 415–425
Li Z. Studying and suggesting logging locations in code blocks. In: Proceedings of the 42nd ACM/IEEE International Conference on Software Engineering: Companion Proceedings. 2020, 125–127
Gholamian S. Leveraging code clones and natural language processing for log statement prediction. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021, 1043–1047
Cinque M, Cotroneo D, Pecchia A. Event logs for the analysis of software failures: a rule-based approach. IEEE Transactions on Software Engineering, 2013, 39(6): 806–821
Li S, Niu X, Jia Z, Liao X, Wang J, Li T. Guiding log revisions by learning from software evolution history. Empirical Software Engineering, 2020, 25(3): 2302–2340
Zhang H, Tang Y, Lamothe M, Li H, Shang W. Studying logging practice in test code. Empirical Software Engineering, 2022, 27(4): 83
Zadrozny P, Kodali R. Big Data Analytics Using Splunk: Deriving Operational Intelligence from Social Media, Machine Data, Existing Data Warehouses, and Other Real-Time Streaming Sources. Berkeley: Apress, 2013
Patel H A, Meniya A D. A survey on commercial and open source cloud monitoring. International Journal of Science and Modern Engineering (IJISME), 2013, 1(2): 42–44
George L. HBase: The Definitive Guide: Random Access to Your Planet-Size Data. Sebastopol: O’Reilly Media, Inc., 2011
Serrano D, Han D, Stroulia E. From relations to multi-dimensional maps: towards an SQL-to-HBase transformation methodology. In: Proceedings of the 8th IEEE International Conference on Cloud Computing. 2015, 81–89
Bhupathiraju V, Ravuri R P. The dawn of big data-Hbase. In: Proceedings of 2014 Conference on IT in Business, Industry and Government (CSIBIG). 2014, 1–4
Saloustros G, Magoutis K. Rethinking Hbase: design and implementation of an elastic key-value store over log-structured local volumes. In: Proceedings the 14th International Symposium on Parallel and Distributed Computing. 2015, 225–234
Zhang C, Liu X. HBaseMQ: a distributed message queuing system on clouds with HBase. In: Proceedings of 2013 Proceedings IEEE INFOCOM. 2013, 40–44
Hou Y, Yuan S, Xu W, Wei D. Transformation of an E-R model into HBase tables: a data store design for IHE-XDS document registry. In: Proceedings of the 12th IEEE International Conference on Ubiquitous Intelligence and Computing and the 12th IEEE International Conference on Autonomic and Trusted Computing and the 15th IEEE International Conference on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom). 2015, 1809–1812
Bao X, Liu L, Xiao N, Liu F, Zhang Q, Zhu T. HConfig: resource adaptive fast bulk loading in HBase. In: Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. 2014, 215–224
Giblin C, Rooney S, Vetsch P, Preston A. Securing Kafka with encryption-at-rest. In: Proceedings of 2021 IEEE International Conference on Big Data (Big Data). 2021, 5378–5387
Wang Z, Dai W, Wang F, Deng H, Wei S, Zhang X, Liang B. Kafka and its using in high-throughput and reliable message distribution. In: Proceedings of the 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS). 2015, 117–120
Wu H. Research proposal: reliability evaluation of the apache Kafka streaming system. In: Proceedings of 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 2019, 112–113
Zhang H, Fang L, Jiang K, Zhang W, Li M, Zhou L. Secure door on cloud: a secure data transmission scheme to protect Kafka’s data. In: Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems (ICPADS). 2020, 406–413
Tsai W, Bai X, Huang Y. Software-as-a-service (SaaS): perspectives and challenges. Science China Information Sciences, 2014, 57(5): 1–15
Liu D, Pei D, Zhao Y. Application-aware latency monitoring for cloud tenants via CloudWatch+. In: Proceedings of the 10th International Conference on Network and Service Management (CNSM) and Workshop. 2014, 73–81
Stephen A, Benedict S, Kumar R P A. Monitoring IaaS using various cloud monitors. Cluster Computing, 2019, 22(5): 12459–12471
Da Silva Rocha É, Da Silva L G F, Santos G L, Bezerra D, Moreira A, Gonçalves G, Marquezini M V, Mehta A, Wildeman M, Kelner J, Sadok D, Endo P T. Aggregating data center measurements for availability analysis. Software: Practice and Experience, 2021, 51(5): 868–892
Tasquier L, Venticinque S, Aversa R, Di Martino B. Agent based application tools for cloud provisioning and management. In: Proceedings of the 3rd International Conference on Cloud Computing. 2012, 32–42
De Chaves S A, Uriarte R B, Westphall C B. Toward an architecture for monitoring private clouds. IEEE Communications Magazine, 2011, 49(12): 130–137
Massie M L, Chun B N, Culler D E. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 2004, 30(7): 817–840
Nagios X. The industry standard in it infrastructure monitoring. See Logon-int.com/nagios/ website, 2011
Mardiyono A, Sholihah W, Hakim F. Mobile-based network monitoring system using Zabbix and telegram. In: Proceedings of the 3rd International Conference on Computer and Informatics Engineering (IC2IE). 2020, 473–477
Andreozzi S, De Bortoli N, Fantinel S, Ghiselli A, Rubini G L, Tortone G, Vistoli M C. GridiCE: a monitoring service for grid systems. Future Generation Computer Systems, 2005, 21(4): 559–571
König B, Calero J M A, Kirschnick J. Elastic monitoring framework for cloud infrastructures. IET Communications, 2012, 6(10): 1306–1315
Povedano-Molina J, Lopez-Vega J M, Lopez-Soler J M, Corradi A, Foschini L. DARGOS: a highly adaptable and scalable monitoring architecture for multi-tenant clouds. Future Generation Computer Systems, 2013, 29(8): 2041–2056
Meng S, Liu L. Enhanced monitoring-as-a-service for effective cloud management. IEEE Transactions on Computers, 2013, 62(9): 1705–1720
Calero J M A, Aguado J G. MonPaaS: An adaptive monitoring platformas a service for cloud computing infrastructures and services. IEEE Transactions on Services Computing, 2015, 8(1): 65–78
Alhamazani K, Ranjan R, Jayaraman P P, Mitra K, Liu C, Rabhi F, Georgakopoulos D, Wang L. Cross-layer multi-cloud real-time application QoS monitoring and benchmarking as-a-service framework. IEEE Transactions on Cloud Computing, 2019, 7(1): 48–61
Wang H, Zhang X, Ma Z, Li L, Gao J. An microservices-based openstack monitoring system. In: Proceedings of the 11th International Conference on Educational and Information Technology (ICEIT). 2022, 232–236
Badshah A, Jalal A, Farooq U, Rehman G U, Band S S, Iwendi C. Service level agreement monitoring as a service: an independent monitoring service for service level agreements in clouds. Big Data, 2023, 11(5): 339–354
Mezni H, Sellami M, Aridhi S, Charrada F B. Towards big services: a synergy between service computing and parallel programming. Computing, 2021, 103(11): 2479–2519
Mezni H. Web service adaptation: a decade’s overview. Computer Science Review, 2023, 48: 100535
Kumar R, Jain K, Maharwal H, Jain N, Dadhich A. Apache CloudStack: open source infrastructure as a service cloud computing platform. International Journal of Advancement in Engineering Technology, Management & Applied Science, 2014, 1(2): 111–116
Schwartz B, Zaitsev P, Tkachenko V. High Performance MySQL: Optimization, Backups, and Replication. Sebastopol: O’Reilly Media, Inc., 2012
Sun W, Zhang X, Guo C J, Sun P, Su H. Software as a service: configuration and customization perspectives. In: Proceedings of 2008 IEEE Congress on Services Part II (Services-2 2008). 2008, 18–25
Lan Z, Zheng Z, Li Y. Toward automated anomaly identification in large-scale systems. IEEE Transactions on Parallel and Distributed Systems, 2010, 21(2): 174–187
Yu L, Lan Z. A scalable, non-parametric method for detecting performance anomaly in large scale computing. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(7): 1902–1914
Odyurt U, Meyer H, Pimentel A D, Paradas E, Alonso I G. Software passports for automated performance anomaly detection of cyberphysical systems. In: Proceedings of the 19th International Conference on Embedded Computer Systems. 2019, 255–268
Wang R, Ying S. SaaS software performance issue diagnosis using independent component analysis and restricted boltzmann machine. Concurrency and Computation: Practice and Experience, 2020, 32(14): e5729
Zhao N, Han B, Cai Y, Su J. SeqAD: an unsupervised and sequential autoencoder ensembles based anomaly detection framework for KPI. In: Proceedings of the 29th IEEE/ACM International Symposium on Quality of Service (IWQOS). 2021, 1–6
Chaturvedi A. Method and system for near real time reduction of insignificant key performance indicator data in a heterogeneous radio access and core network. In: Proceedings of 2020 IEEE Wireless Communications and Networking Conference Workshops (WCNCW). 2020, 1–7
Kusrini E, Safitri K N, Fole A. Design key performance indicator for distribution sustainable supply chain management. In: Proceedings of 2020 International Conference on Decision Aid Sciences and Application (DASA). 2020, 738–744
Hinderks A, Schrepp M, Mayo F J D, Escalona M J, Thomaschewski J. Developing a UX KPI based on the user experience questionnaire. Computer Standards & Interfaces, 2019, 65: 38–44
Fotrousi F, Fricker S A, Fiedler M, Le-Gall F. KPIs for software ecosystems: a systematic mapping study. In: Proceedings of the 5th International Conference of Software Business. 2014, 194–211
Zhang S, Zhao C, Sui Y, Su Y, Sun Y, Zhang Y, Pei D, Wang Y. Robust KPI anomaly detection for large-scale software services with partial labels. In: Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering (ISSRE). 2021, 103–114
Jiang Y, Haihong E, Song M, Zhang K. Research and application of newborn defects prediction based on spark and PU-learning. In: Proceedings of the 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). 2018, 657–663
Shu S, Lin Z, Yan Y, Li L. Learning from multi-class positive and unlabeled data. In: Proceedings of 2020 IEEE International Conference on Data Mining (ICDM). 2020, 1256–1261
Chen X, Chen W, Chen T, Yuan Y, Gong C, Chen K, Wang Z. Self-PU: Self boosted and calibrated positive-unlabeled training. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 141
Han K, Chen W, Xu M. Investigating active positive-unlabeled learning with deep networks. In: Proceedings of the 34th Australasian Joint Conference on Advances in Artificial Intelligence. 2022, 607–618
Hu W, Le R, Liu B, Ji F, Ma J, Zhao D, Yan R. Predictive adversarial learning from positive and unlabeled data. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 7806–7814
Qiu J, Cai X, Zhang X, Cheng F, Yuan S, Fu G. An evolutionary multi-objective approach to learn from positive and unlabeled data. Applied Soft Computing, 2021, 101: 106986
Gong C, Liu T, Yang J, Tao D. Large-margin label-calibrated support vector machines for positive and unlabeled learning. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(11): 3471–3483
He P, Zhu J, Zheng Z, Lyu M R. Drain: an online log parsing approach with fixed depth tree. In: Proceedings of 2017 IEEE International Conference on Web Services (ICWS). 2017, 33–40
Plaat A, Schaeffer J, Pijls W, De Bruin A. Best-first fixed-depth minimax algorithms. Artificial Intelligence, 1996, 87(1–2): 255–293
Du M, Li F. Spell: online streaming parsing of large unstructured system logs. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(11): 2213–2227
Wang B, Yang X, Li J. Locating longest common subsequences with limited penalty. In: Proceedings of the 22nd International Conference on Database Systems for Advanced Applications. 2017, 187–201
Weems B P, Bai Y. Finding longest common increasing subsequence for two different scenarios of non-random input sequences. In: Proceedings of 2005 International Conference on Foundations of Computer Science. 2005, 64–72
Meng W, Liu Y, Zaiter F, Zhang S, Chen Y, Zhang Y, Zhu Y, Wang E, Zhang R, Tao S, Yang D, Zhou R, Pei D. LogParse: making log parsing adaptive through word classification. In: Proceedings of the 29th International Conference on Computer Communications and Networks (ICCCN). 2020, 1–9
Vervaet A, Chiky R, Callau-Zori M. USTEP: unfixed search tree for efficient log parsing. In: Proceedings of 2021 IEEE International Conference on Data Mining (ICDM). 2021, 659–668
Chakrabarti A, Striegel A, Manimaran G. A case for tree evolution in QoS multicasting. In: Proceedings of the 10th IEEE International Workshop on Quality of Service (Cat. No.02EX564). 2002, 116–125
Li K. A random-walk-based dynamic tree evolution algorithm with exponential speed of convergence to optimality on regular networks. In: Proceedings of the 4th International Conference on Frontier of Computer Science and Technology. 2009, 80–85
Tomer A, Schach S R. The evolution tree: a maintenance-oriented software development model. In: Proceedings of the 4th European Conference on Software Maintenance and Reengineering. 2000, 209–214
Dai H, Li H, Chen C S, Shang W, Chen T H. Logram: efficient log parsing using n-gram dictionaries. IEEE Transactions on Software Engineering, 2022, 48(3): 879–892
Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 149–158
Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In: Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles. 2009, 117–132
Nedelkoski S, Cardoso J, Kao O. Anomaly detection from system tracing data using multimodal deep learning. In: Proceedings of the 12th IEEE International Conference on Cloud Computing (CLOUD). 2019, 179–186
Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K. TadGAN: Time series anomaly detection using generative adversarial networks. In: Proceedings of 2020 IEEE International Conference on Big Data (Big Data). 2020, 33–43
Luo W, Wang P, Wang J, An W. The research process of generative adversarial networks. Journal of Physics: Conference Series, 2019, 1176(3): 032008
Tran N T, Tran V H, Nguyen N B, Nguyen T K, Cheung N M. On data augmentation for GAN training. IEEE Transactions on Image Processing, 2021, 30: 1882–1897
Liu Z, Sabar N, Song A. Improving evolutionary generative adversarial networks. In: Proceedings of the 34th Australasian Joint Conference on Artificial Intelligence. 2022, 691–702
Sinha R, Sankaran A, Vatsa M, Singh R. AuthorGAN: Improving GAN reproducibility using a modular GAN framework. 2019, arXiv preprint arXiv: 1911.13250
Xia W, Zhang Y, Yang Y, Xue J H, Zhou B, Yang M H. GAN inversion: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3121–3138
Wang X, Cao Q, Wang Q, Cao Z, Zhang X, Wang P. Robust log anomaly detection based on contrastive learning and multi-scale mass. The Journal of Supercomputing, 2022, 78(16): 17491–17512
Zhang Z, Wu S, Jiang D, Chen G. BERT-JAM: boosting BERT-enhanced neural machine translation with joint attention. 2020, arXiv preprint arXiv: 2011.04266
Trang N T M, Shcherbakov M. Vietnamese question answering system from multilingual BERT models to monolingual BERT model. In: Proceedings of the 9th International Conference System Modeling and Advancement in Research Trends (SMART). 2020, 201–206
Shi L, Liu D, Liu G, Meng K. AUG-BERT: an efficient data augmentation algorithm for text classification. In: Proceedings of the 8th International Conference in Communications, Signal Processing, and Systems. 2020, 2191–2198
Praechanya N, Sornil O. Improving Thai named entity recognition performance using BERT transformer on deep networks. In: Proceedings of the 6th International Conference on Machine Learning Technologies. 2021, 177–183
Yang L, Chen J, Wang Z, Wang W, Jiang J, Dong X, Zhang W. Semi-supervised log-based anomaly detection via probabilistic label estimation. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). 2021, 1448–1460
Farzad A, Gulliver T A. Log message anomaly detection with fuzzy c-means and MLP. Applied Intelligence, 2022, 52(15): 17708–17717
Zhang C, Peng X, Sha C, Zhang K, Fu Z, Wu X, Lin Q, Zhang D. DeepTraLog: Trace-log combined microservice anomaly detection through graph-based deep learning. In: Proceedings of the 44th International Conference on Software Engineering. 2022, 623–634
Zhang C, Peng X, Zhou T, Sha C, Yan Z, Chen Y, Yang H. TraceCRL: contrastive representation learning for microservice trace analysis. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022, 1221–1232
Aguilera M K, Mogul J C, Wiener J L, Reynolds P, Muthitacharoen A. Performance debugging for distributed systems of black boxes. ACM SIGOPS Operating Systems Review, 2003, 37(5): 74–89
Chen Y Y M. Path-Based Failure and Evolution Management. Berkeley: University of California at Berkeley, 2004
Barham P, Donnelly A, Isaacs R, Mortier R. Using magpie for request extraction and workload modelling. In: Proceedings of the 6th Symposium on Operating System Design & Implementation. 2004, 18
Chen H, Jiang G, Ungureanu C, Yoshihira K. Failure detection and localization in component based systems by online tracking. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 750–755
Lim M H, Lou J G, Zhang H, Fu Q, Teoh A B J, Lin Q, Ding R, Zhang D. Identifying recurrent and unknown performance issues. In: Proceedings of 2014 IEEE International Conference on Data Mining. 2014, 320–329
Fischer A, Igel C. Training restricted Boltzmann machines: an introduction. Pattern Recognition, 2014, 47(1): 25–39
Carreira-Perpinan M Á, Hinton G E. On contrastive divergence learning. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. 2005, 33–40
Liu P, Xu H, Ouyang Q, Jiao R, Chen Z, Zhang S, Yang J, Mo L, Zeng J, Xue W, Pei D. Unsupervised detection of microservice trace anomalies through service-level deep Bayesian networks. In: Proceedings of the 31st IEEE International Symposium on Software Reliability Engineering (ISSRE). 2020, 48–58
Kohyarnejadfard I, Aloise D, Dagenais M R, Shakeri M. A framework for detecting system performance anomalies using tracing data analysis. Entropy, 2021, 23(8): 1011
Cai Y, Han B, Su J, Wang X. TraceModel: an automatic anomaly detection and root cause localization framework for microservice systems. In: Proceedings of the 17th International Conference on Mobility, Sensing and Networking (MSN). 2021, 512–519
Li M, Tang D, Wen Z, Cheng Y. Microservice anomaly detection based on tracing data using semi-supervised learning. In: Proceedings of the 4th International Conference on Artificial Intelligence and Big Data (ICAIBD). 2021, 38–44
Liu J, Hu Y, Wu B, Wang Y, Xie F. A hybrid generalized hidden markov model-based condition monitoring approach for rolling bearings. Sensors, 2017, 17(5): 1143
Wang R, Ying S, Sun C, Wan H, Zhang H, Jia X. Model construction and data management of running log in supporting saas software performance analysis. In: Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering (SEKE 2017). 2017, 149–154
Fu X, Ren R, Zhan J, Zhou W, Jia Z, Lu G. LogMaster: mining event correlations in logs of large-scale cluster systems. In: Proceedings of the 31st IEEE Symposium on Reliable Distributed Systems. 2012, 71–80
Zou D, Qin H, Jin H, Qiang W, Han Z, Chen X. Improving log-based fault diagnosis by log classification. In: Proceedings of the 11th IFIP International Conference on Network and Parallel Computing. 2014, 446–458
Guo X, Peng X, Wang H, Li W, Jiang H, Ding D, Xie T, Su L. Graph-based trace analysis for microservice architecture understanding and problem diagnosis. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020, 1387–1397
Huo Y, Dong J, Ge Z, Xie P, An N, Yang Y. IWApriori: an association rule mining and self-updating method based on weighted increment. In: Proceedings of the 21st Asia-Pacific Network Operations and Management Symposium (APNOMS). 2020, 167–172
Wang L, Zhao N, Chen J, Li P, Zhang W, Sui K. Root-cause metric location for microservice systems via log anomaly detection. In: Proceedings of 2020 IEEE International Conference on Web Services (ICWS). 2020, 142–150
Liu D, He C, Peng X, Lin F, Zhang C, Gong S, Li Z, Ou J, Wu Z. MicroHECL: High-efficient root cause localization in large-scale microservice systems. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 2021, 338–347
Gan Y, Liang M, Dev S, Lo D, Delimitrou C. Sage: practical and scalable ML-driven performance debugging in microservices. In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2021, 135–151
Ma M, Lin W, Pan D, Wang P. ServiceRank: root cause identification of anomaly in large-scale microservice architectures. IEEE Transactions on Dependable and Secure Computing, 2022, 19(5): 3087–3100
Li M, Li Z, Yin K, Nie X, Zhang W, Sui K, Pei D. Causal inference-based root cause analysis for online service systems with intervention recognition. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 3230–3240
Zhao X, Zhang Y, Lion D, Ullah M F, Luo Y, Yuan D, Stumm M. lprof: a non-intrusive request flow profiler for distributed systems. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 2014, 629–644
Bare K, Kavulya S P, Tan J, Pan X, Marinelli E, Kasick M, Gandhi R, Narasimhan P. ASDF: an automated, online framework for diagnosing performance problems. In: Casimiro A, Lemos R, Gacek C, eds. Architecting Dependable Systems VII. Berlin: Springer, 2010, 201–226
Attariyan M, Chow M, Flinn J. X-ray: automating root-cause diagnosis of performance anomalies in production software. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 307–320
Malik H, Hemmati H, Hassan A E. Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 35th International Conference on Software Engineering (ICSE). 2013, 1012–1021
Tuncer O, Ates E, Zhang Y, Turk A, Brandt J, Leung V J, Egele M, Coskun A K. Online diagnosis of performance variation in hpc systems using machine learning. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(4): 883–896
Li M, Tang D, Wen Z, Cheng Y. Universal anomaly detection method based on massive monitoring indicators of cloud platform. In: Proceedings of 2021 IEEE International Conference on Software Engineering and Artifcial Intelligence (SEAI). 2021, 23–29
Borghesi A, Molan M, Milano M, Bartolini A. Anomaly detection and anticipation in high performance computing systems. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(4): 739–750
Stehman S V. Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 1997, 62(1): 77–89
Powers D M W. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. 2020, arXiv preprint arXiv: 2010.16061
Hsieh W W. Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels. Cambridge: Cambridge University Press, 2009
Qin J, He Z S. A SVM face recognition method based on Gabor-featured key points. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics. 2005, 5144–5149
Hearst M A, Dumais S T, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Systems and Their Applications, 1998, 13(4): 18–28
Rish I. An empirical study of the naive Bayes classifier. In: Proceedings of IJCAI 2001Workshop on Empirical Methods in Artificial Intelligence. 2001, 41–46
Larose D T, Larose C D. k-nearest neighbor algorithm. In: Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, 2014, 149–164
Manning C, Raghavan P, Schütze H. Vector space classification. In: An Introduction to Information Retrieval. 2009, 289–317
Freedman D A. Statistical Models: Theory and Practice. Cambridge, England: Cambridge University Press, 2009
Lam P, Wang L, Ngan H Y, Yung N H, Yeh A G. Outlier detection in large-scale traffic data by naive Bayes method and Gaussian mixture model method. 2015, arXiv preprint arXiv: 1512.08413
Loh W Y. Classification and regression trees. WIREs: Data Mining and Knowledge Discovery, 2011, 1(1): 14–23
Freund Y, Schapire R E. A desicion-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the 2nd European Conference on Computational Learning Theory. 1995, 23–37
Phillips S J, Anderson R P, Schapire R E. Maximum entropy modeling of species geographic distributions. Ecological Modelling, 2006, 190(3–4): 231–259
Acknowledgments
The work was supported by the National Key R&D Program of China (2022YFB3304300), the Humanities and Social Sciences Youth Foundation, Ministry of Education (23YJCZH221) and the Natural Science Foundation of Shandong Province (ZR2023QE030).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.
Additional information
Rui Wang is currently a lecturer with the College of Energy and Mining Engineering, Shandong University of Science and Technology, China. She received the PhD degree from Wuhan University, China in 2019. Her research interests include software engineering and data mining.
Xiangbo Tian received the BE degree from the Shandong University of Science and Technology, China in 2018. He is currently working toward the PhD degree with the School of Computer Science, Wuhan University, China. His current research interests include service computing and microservices.
Shi Ying is currently a professor in the School of Computer Science, Wuhan University, China. His main research interests include software engineering and artificial intelligence.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Wang, R., Tian, X. & Ying, S. Performance issue monitoring, identification and diagnosis of SaaS software: a survey. Front. Comput. Sci. 19, 191201 (2025). https://doi.org/10.1007/s11704-023-2701-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-023-2701-0