[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Performance issue monitoring, identification and diagnosis of SaaS software: a survey

  • Review Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

SaaS (Software-as-a-Service) is a service model provided by cloud computing. It has a high requirement for QoS (Quality of Software) due to its method of providing software service. However, manual identification and diagnosis for performance issues is typically expensive and laborious because of the complexity of the application software and the dynamic nature of the deployment environment. Recently, substantial research efforts have been devoted to automatically identifying and diagnosing performance issues of SaaS software. In this survey, we comprehensively review the different methods about automatically identifying and diagnosing performance issues of SaaS software. We divide them into three steps according to their function: performance log generation, performance issue identification and performance issue diagnosis. We then comprehensively review these methods by their development history. Meanwhile, we give our proposed solution for each step. Finally, the effectiveness of our proposed methods is shown by experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

Data availability statement The data that support the findings of this study are available from the National Disaster Reduction Center of China but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the National Disaster Reduction Center of China.

References

  1. Chen Z, Kim M, Cui Y. SaaS application mashup based on high speed message processing. KSII Transactions on Internet and Information Systems (TIIS), 2022, 16(5): 1446–1465

    Google Scholar 

  2. De León Guillén M Á D, Morales-Rocha V, Fernández Martínez L F. A systematic review of security threats and countermeasures in SaaS. Journal of Computer Security, 2020, 28(6): 635–653

    Article  Google Scholar 

  3. Soni D, Kumar N. Machine learning techniques in emerging cloud computing integrated paradigms: a survey and taxonomy. Journal of Network and Computer Applications, 2022, 205: 103419

    Article  Google Scholar 

  4. Li W, Zhang Y, Guo Z, Liu L. Study on SaaS cloud service development for telecom operators. Telecommunications Science, 2012, 28(1): 132–136

    Google Scholar 

  5. Ju J, Wang Y, Fu J, Wu J, Lin Z. Research on key technology in SaaS. In: Proceedings of 2010 International Conference on Intelligent Computing and Cognitive Informatics. 2010, 384–387

    Google Scholar 

  6. O’Dywer R, Neville S W. Assessing QoS consistency in cloud-based software-as-a-service deployments. In: Proceedings of 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM). 2017, 1–6

    Google Scholar 

  7. He Q, Han J, Yang Y, Grundy J, Jin H. QoS-driven service selection for multi-tenant SaaS. In: Proceedings of the 5th IEEE International Conference on Cloud Computing. 2012, 566–573

    Google Scholar 

  8. Varshney S, Sandhu R, Gupta P K. QoS based resource provisioning in cloud computing environment: a technical survey. In: Proceedings of the 3rd International Conference on Advances in Computing and Data Sciences. 2019, 711–723

    Chapter  Google Scholar 

  9. Park J, Jeong H Y. The QoS-based MCDM system for SaaS ERP applications with social network. The Journal of Supercomputing, 2013, 66(2): 614–632

    Article  Google Scholar 

  10. Luo H, Shyu M L. Quality of service provision in mobile multimedia-a survey. Human-centric Computing and Information Sciences, 2011, 1(1): 5

    Article  Google Scholar 

  11. Thain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurrency and Computation: Practice and Experience, 2005, 17(2–4): 323–356

    Article  Google Scholar 

  12. Berman F, Fox G, Hey A J G. Grid Computing: Making the Global Infrastructure A Reality. New York: John Wiley & Sons, 2003

    Book  Google Scholar 

  13. Gao J, Pattabhiraman P, Bai X, Tsai W T. SaaS performance and scalability evaluation in clouds. In: Proceedings of the 6th International Symposium on Service Oriented System (SOSE). 2011, 61–71

    Google Scholar 

  14. Wang R, Ying S. SaaS software performance issue identification using HMRF-MAP framework. Software: Practice and Experience, 2018, 48(11): 2000–2018

    Google Scholar 

  15. Munshi M, Shrimali T, Gaur S. A review of enhancing online learning using graph-based data mining techniques. Soft Computing, 2022, 26(12): 5539–5552

    Article  Google Scholar 

  16. Batool I, Khan T A. Software fault prediction using data mining, machine learning and deep learning techniques: a systematic literature review. Computers and Electrical Engineering, 2022, 100: 107886

    Article  Google Scholar 

  17. El-Masri D, Petrillo F, Guéhéneuc Y G, Hamou-Lhadj A, Bouziane A. A systematic literature review on automated log abstraction techniques. Information and Software Technology, 2020, 122: 106276

    Article  Google Scholar 

  18. Zhong Y, Guo Y, Liu C. FLP: a feature-based method for log parsing. Electronics Letters, 2018, 54(23): 1334–1336

    Article  Google Scholar 

  19. Zhang C, Meng X. Log parser with one-to-one markup. In: Proceedings of the 3rd International Conference on Information and Computer Technologies (ICICT). 2020, 251–257

    Google Scholar 

  20. Fang L, Di X, Liu X, Qin Y, Ren W, Ding Q. QuickLogS: a quick log parsing algorithm based on template similarity. In: Proceedings of the 20th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). 2021, 1085–1092

    Google Scholar 

  21. Zeng L, Xiao Y, Chen H, Sun B, Han W. Computer operating system logging and security issues: a survey. Security and Communication Networks, 2016, 9(17): 4804–4821

    Article  Google Scholar 

  22. Chen B, Jiang Z M. A survey of software log instrumentation. ACM Computing Surveys, 2022, 54(4): 90

    Article  Google Scholar 

  23. Behera A, Panigrahi C R, Pati B. Unstructured Log Analysis for System Anomaly Detection—A Study. Advances in Data Science and Management: Proceedings of ICDSM 2021. Singapore: Springer Nature Singapore, 2022, 497–509, 149–158

    Chapter  Google Scholar 

  24. Fu Q, Lou J G, Lin Q, Ding R, Zhang D, Xie T. Contextual analysis of program logs for understanding system behaviors. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). 2013, 397–400

    Google Scholar 

  25. Clayman S, Galis A, Mamatas L. Monitoring virtual networks with lattice. In: Proceedings of 2010 IEEE/IFIP Network Operations and Management Symposium Workshops. 2010, 239–246

    Chapter  Google Scholar 

  26. Yao K, De Padua G B, Shang W, Sporea C, Toma A, Sajedi S. Log4perf: Suggesting and updating logging locations for web-based systems’ performance monitoring. Empirical Software Engineering, 2020, 25(1): 488–531

    Article  Google Scholar 

  27. Rong G, Zhang Q, Liu X, Gu S. A systematic review of logging practice in software engineering. In: Proceedings of the 24th Asia-Pacific Software Engineering Conference (APSEC). 2017, 534–539

    Google Scholar 

  28. He S, He P, Chen Z, Yang T, Su Y, Lyu M R. A survey on automated log analysis for reliability engineering. ACM Computing Surveys, 2022, 54(6): 130

    Article  Google Scholar 

  29. Gujral H, Lal S, Li H. An exploratory semantic analysis of logging questions. Journal of Software: Evolution and Process, 2021, 33(7): e2361

    Google Scholar 

  30. Schwarz C. Ldagibbs: A command for topic modeling in Stata using latent dirichlet allocation. The Stata Journal: Promoting Communications on Statistics and Stata, 2018, 18(1): 101–117

    Article  Google Scholar 

  31. Joung J, Kim H M. Automated keyword filtering in latent dirichlet allocation for identifying product attributes from online reviews. Journal of Mechanical Design, 2021, 143(8): 084501

    Article  Google Scholar 

  32. Li H, Liu J, Zhang S. Hierarchical latent dirichlet allocation models for realistic action recognition. In: Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2011, 1297–1300

    Google Scholar 

  33. Fu J, Liu N, Hu C, Zhang X. Hot topic classification of microblogging based on cascaded latent dirichlet allocation. ICIC Express Letters, Part B: Applications, 2016, 7(3): 621–625

    Google Scholar 

  34. Wu J, Son G, Wang S. A competency mining method based on latent dirichlet allocation (LDA) model. Journal of Physics: Conference Series, 2020, 1682: 012059

    Google Scholar 

  35. Liu Y, Jin Z. A text classification model constructed by latent dirichlet allocation and deep learning. In: Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. 2015

    Google Scholar 

  36. Rus V, Niraula N, Banjade R. Similarity measures based on latent dirichlet allocation. In: Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing. 2013, 459–470

    Chapter  Google Scholar 

  37. Yuan D, Park S, Huang P, Liu Y, Lee M M, Tang X, Zhou Y, Savage S. Be conservative: enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 293–306

    Google Scholar 

  38. Fu Q, Zhu J, Hu W, Lou J G, Ding R, Lin Q, Zhang D, Xie T. Where do developers log? An empirical study on logging practices in industry. In: Proceedings of the 36th International Conference on Software Engineering. 2014, 24–33

    Google Scholar 

  39. Zhu J, He P, Fu Q, Zhang H, Lyu M R, Zhang D. Learning to log: helping developers make informed logging decisions. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering. 2015, 415–425

    Google Scholar 

  40. Li Z. Studying and suggesting logging locations in code blocks. In: Proceedings of the 42nd ACM/IEEE International Conference on Software Engineering: Companion Proceedings. 2020, 125–127

    Chapter  Google Scholar 

  41. Gholamian S. Leveraging code clones and natural language processing for log statement prediction. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021, 1043–1047

    Google Scholar 

  42. Cinque M, Cotroneo D, Pecchia A. Event logs for the analysis of software failures: a rule-based approach. IEEE Transactions on Software Engineering, 2013, 39(6): 806–821

    Article  Google Scholar 

  43. Li S, Niu X, Jia Z, Liao X, Wang J, Li T. Guiding log revisions by learning from software evolution history. Empirical Software Engineering, 2020, 25(3): 2302–2340

    Article  Google Scholar 

  44. Zhang H, Tang Y, Lamothe M, Li H, Shang W. Studying logging practice in test code. Empirical Software Engineering, 2022, 27(4): 83

    Article  Google Scholar 

  45. Zadrozny P, Kodali R. Big Data Analytics Using Splunk: Deriving Operational Intelligence from Social Media, Machine Data, Existing Data Warehouses, and Other Real-Time Streaming Sources. Berkeley: Apress, 2013

    Book  Google Scholar 

  46. Patel H A, Meniya A D. A survey on commercial and open source cloud monitoring. International Journal of Science and Modern Engineering (IJISME), 2013, 1(2): 42–44

    Google Scholar 

  47. George L. HBase: The Definitive Guide: Random Access to Your Planet-Size Data. Sebastopol: O’Reilly Media, Inc., 2011

    Google Scholar 

  48. Serrano D, Han D, Stroulia E. From relations to multi-dimensional maps: towards an SQL-to-HBase transformation methodology. In: Proceedings of the 8th IEEE International Conference on Cloud Computing. 2015, 81–89

    Google Scholar 

  49. Bhupathiraju V, Ravuri R P. The dawn of big data-Hbase. In: Proceedings of 2014 Conference on IT in Business, Industry and Government (CSIBIG). 2014, 1–4

    Google Scholar 

  50. Saloustros G, Magoutis K. Rethinking Hbase: design and implementation of an elastic key-value store over log-structured local volumes. In: Proceedings the 14th International Symposium on Parallel and Distributed Computing. 2015, 225–234

    Google Scholar 

  51. Zhang C, Liu X. HBaseMQ: a distributed message queuing system on clouds with HBase. In: Proceedings of 2013 Proceedings IEEE INFOCOM. 2013, 40–44

    Chapter  Google Scholar 

  52. Hou Y, Yuan S, Xu W, Wei D. Transformation of an E-R model into HBase tables: a data store design for IHE-XDS document registry. In: Proceedings of the 12th IEEE International Conference on Ubiquitous Intelligence and Computing and the 12th IEEE International Conference on Autonomic and Trusted Computing and the 15th IEEE International Conference on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom). 2015, 1809–1812

    Google Scholar 

  53. Bao X, Liu L, Xiao N, Liu F, Zhang Q, Zhu T. HConfig: resource adaptive fast bulk loading in HBase. In: Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. 2014, 215–224

    Google Scholar 

  54. Giblin C, Rooney S, Vetsch P, Preston A. Securing Kafka with encryption-at-rest. In: Proceedings of 2021 IEEE International Conference on Big Data (Big Data). 2021, 5378–5387

    Google Scholar 

  55. Wang Z, Dai W, Wang F, Deng H, Wei S, Zhang X, Liang B. Kafka and its using in high-throughput and reliable message distribution. In: Proceedings of the 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS). 2015, 117–120

    Google Scholar 

  56. Wu H. Research proposal: reliability evaluation of the apache Kafka streaming system. In: Proceedings of 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 2019, 112–113

    Chapter  Google Scholar 

  57. Zhang H, Fang L, Jiang K, Zhang W, Li M, Zhou L. Secure door on cloud: a secure data transmission scheme to protect Kafka’s data. In: Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems (ICPADS). 2020, 406–413

    Google Scholar 

  58. Tsai W, Bai X, Huang Y. Software-as-a-service (SaaS): perspectives and challenges. Science China Information Sciences, 2014, 57(5): 1–15

    Article  Google Scholar 

  59. Liu D, Pei D, Zhao Y. Application-aware latency monitoring for cloud tenants via CloudWatch+. In: Proceedings of the 10th International Conference on Network and Service Management (CNSM) and Workshop. 2014, 73–81

    Chapter  Google Scholar 

  60. Stephen A, Benedict S, Kumar R P A. Monitoring IaaS using various cloud monitors. Cluster Computing, 2019, 22(5): 12459–12471

    Article  Google Scholar 

  61. Da Silva Rocha É, Da Silva L G F, Santos G L, Bezerra D, Moreira A, Gonçalves G, Marquezini M V, Mehta A, Wildeman M, Kelner J, Sadok D, Endo P T. Aggregating data center measurements for availability analysis. Software: Practice and Experience, 2021, 51(5): 868–892

    Google Scholar 

  62. Tasquier L, Venticinque S, Aversa R, Di Martino B. Agent based application tools for cloud provisioning and management. In: Proceedings of the 3rd International Conference on Cloud Computing. 2012, 32–42

    Google Scholar 

  63. De Chaves S A, Uriarte R B, Westphall C B. Toward an architecture for monitoring private clouds. IEEE Communications Magazine, 2011, 49(12): 130–137

    Article  Google Scholar 

  64. Massie M L, Chun B N, Culler D E. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 2004, 30(7): 817–840

    Article  Google Scholar 

  65. Nagios X. The industry standard in it infrastructure monitoring. See Logon-int.com/nagios/ website, 2011

    Google Scholar 

  66. Mardiyono A, Sholihah W, Hakim F. Mobile-based network monitoring system using Zabbix and telegram. In: Proceedings of the 3rd International Conference on Computer and Informatics Engineering (IC2IE). 2020, 473–477

    Google Scholar 

  67. Andreozzi S, De Bortoli N, Fantinel S, Ghiselli A, Rubini G L, Tortone G, Vistoli M C. GridiCE: a monitoring service for grid systems. Future Generation Computer Systems, 2005, 21(4): 559–571

    Article  Google Scholar 

  68. König B, Calero J M A, Kirschnick J. Elastic monitoring framework for cloud infrastructures. IET Communications, 2012, 6(10): 1306–1315

    Article  Google Scholar 

  69. Povedano-Molina J, Lopez-Vega J M, Lopez-Soler J M, Corradi A, Foschini L. DARGOS: a highly adaptable and scalable monitoring architecture for multi-tenant clouds. Future Generation Computer Systems, 2013, 29(8): 2041–2056

    Article  Google Scholar 

  70. Meng S, Liu L. Enhanced monitoring-as-a-service for effective cloud management. IEEE Transactions on Computers, 2013, 62(9): 1705–1720

    Article  MathSciNet  Google Scholar 

  71. Calero J M A, Aguado J G. MonPaaS: An adaptive monitoring platformas a service for cloud computing infrastructures and services. IEEE Transactions on Services Computing, 2015, 8(1): 65–78

    Article  Google Scholar 

  72. Alhamazani K, Ranjan R, Jayaraman P P, Mitra K, Liu C, Rabhi F, Georgakopoulos D, Wang L. Cross-layer multi-cloud real-time application QoS monitoring and benchmarking as-a-service framework. IEEE Transactions on Cloud Computing, 2019, 7(1): 48–61

    Article  Google Scholar 

  73. Wang H, Zhang X, Ma Z, Li L, Gao J. An microservices-based openstack monitoring system. In: Proceedings of the 11th International Conference on Educational and Information Technology (ICEIT). 2022, 232–236

    Google Scholar 

  74. Badshah A, Jalal A, Farooq U, Rehman G U, Band S S, Iwendi C. Service level agreement monitoring as a service: an independent monitoring service for service level agreements in clouds. Big Data, 2023, 11(5): 339–354

    Article  Google Scholar 

  75. Mezni H, Sellami M, Aridhi S, Charrada F B. Towards big services: a synergy between service computing and parallel programming. Computing, 2021, 103(11): 2479–2519

    Article  Google Scholar 

  76. Mezni H. Web service adaptation: a decade’s overview. Computer Science Review, 2023, 48: 100535

    Article  Google Scholar 

  77. Kumar R, Jain K, Maharwal H, Jain N, Dadhich A. Apache CloudStack: open source infrastructure as a service cloud computing platform. International Journal of Advancement in Engineering Technology, Management & Applied Science, 2014, 1(2): 111–116

    Google Scholar 

  78. Schwartz B, Zaitsev P, Tkachenko V. High Performance MySQL: Optimization, Backups, and Replication. Sebastopol: O’Reilly Media, Inc., 2012

    Google Scholar 

  79. Sun W, Zhang X, Guo C J, Sun P, Su H. Software as a service: configuration and customization perspectives. In: Proceedings of 2008 IEEE Congress on Services Part II (Services-2 2008). 2008, 18–25

    Chapter  Google Scholar 

  80. Lan Z, Zheng Z, Li Y. Toward automated anomaly identification in large-scale systems. IEEE Transactions on Parallel and Distributed Systems, 2010, 21(2): 174–187

    Article  Google Scholar 

  81. Yu L, Lan Z. A scalable, non-parametric method for detecting performance anomaly in large scale computing. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(7): 1902–1914

    Article  Google Scholar 

  82. Odyurt U, Meyer H, Pimentel A D, Paradas E, Alonso I G. Software passports for automated performance anomaly detection of cyberphysical systems. In: Proceedings of the 19th International Conference on Embedded Computer Systems. 2019, 255–268

    Google Scholar 

  83. Wang R, Ying S. SaaS software performance issue diagnosis using independent component analysis and restricted boltzmann machine. Concurrency and Computation: Practice and Experience, 2020, 32(14): e5729

    Article  Google Scholar 

  84. Zhao N, Han B, Cai Y, Su J. SeqAD: an unsupervised and sequential autoencoder ensembles based anomaly detection framework for KPI. In: Proceedings of the 29th IEEE/ACM International Symposium on Quality of Service (IWQOS). 2021, 1–6

    Google Scholar 

  85. Chaturvedi A. Method and system for near real time reduction of insignificant key performance indicator data in a heterogeneous radio access and core network. In: Proceedings of 2020 IEEE Wireless Communications and Networking Conference Workshops (WCNCW). 2020, 1–7

    Google Scholar 

  86. Kusrini E, Safitri K N, Fole A. Design key performance indicator for distribution sustainable supply chain management. In: Proceedings of 2020 International Conference on Decision Aid Sciences and Application (DASA). 2020, 738–744

    Chapter  Google Scholar 

  87. Hinderks A, Schrepp M, Mayo F J D, Escalona M J, Thomaschewski J. Developing a UX KPI based on the user experience questionnaire. Computer Standards & Interfaces, 2019, 65: 38–44

    Article  Google Scholar 

  88. Fotrousi F, Fricker S A, Fiedler M, Le-Gall F. KPIs for software ecosystems: a systematic mapping study. In: Proceedings of the 5th International Conference of Software Business. 2014, 194–211

    Google Scholar 

  89. Zhang S, Zhao C, Sui Y, Su Y, Sun Y, Zhang Y, Pei D, Wang Y. Robust KPI anomaly detection for large-scale software services with partial labels. In: Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering (ISSRE). 2021, 103–114

    Google Scholar 

  90. Jiang Y, Haihong E, Song M, Zhang K. Research and application of newborn defects prediction based on spark and PU-learning. In: Proceedings of the 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). 2018, 657–663

    Google Scholar 

  91. Shu S, Lin Z, Yan Y, Li L. Learning from multi-class positive and unlabeled data. In: Proceedings of 2020 IEEE International Conference on Data Mining (ICDM). 2020, 1256–1261

    Chapter  Google Scholar 

  92. Chen X, Chen W, Chen T, Yuan Y, Gong C, Chen K, Wang Z. Self-PU: Self boosted and calibrated positive-unlabeled training. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 141

    Google Scholar 

  93. Han K, Chen W, Xu M. Investigating active positive-unlabeled learning with deep networks. In: Proceedings of the 34th Australasian Joint Conference on Advances in Artificial Intelligence. 2022, 607–618

    Google Scholar 

  94. Hu W, Le R, Liu B, Ji F, Ma J, Zhao D, Yan R. Predictive adversarial learning from positive and unlabeled data. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 7806–7814

    Google Scholar 

  95. Qiu J, Cai X, Zhang X, Cheng F, Yuan S, Fu G. An evolutionary multi-objective approach to learn from positive and unlabeled data. Applied Soft Computing, 2021, 101: 106986

    Article  Google Scholar 

  96. Gong C, Liu T, Yang J, Tao D. Large-margin label-calibrated support vector machines for positive and unlabeled learning. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(11): 3471–3483

    Article  Google Scholar 

  97. He P, Zhu J, Zheng Z, Lyu M R. Drain: an online log parsing approach with fixed depth tree. In: Proceedings of 2017 IEEE International Conference on Web Services (ICWS). 2017, 33–40

    Chapter  Google Scholar 

  98. Plaat A, Schaeffer J, Pijls W, De Bruin A. Best-first fixed-depth minimax algorithms. Artificial Intelligence, 1996, 87(1–2): 255–293

    Article  MathSciNet  Google Scholar 

  99. Du M, Li F. Spell: online streaming parsing of large unstructured system logs. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(11): 2213–2227

    Article  Google Scholar 

  100. Wang B, Yang X, Li J. Locating longest common subsequences with limited penalty. In: Proceedings of the 22nd International Conference on Database Systems for Advanced Applications. 2017, 187–201

    Chapter  Google Scholar 

  101. Weems B P, Bai Y. Finding longest common increasing subsequence for two different scenarios of non-random input sequences. In: Proceedings of 2005 International Conference on Foundations of Computer Science. 2005, 64–72

    Google Scholar 

  102. Meng W, Liu Y, Zaiter F, Zhang S, Chen Y, Zhang Y, Zhu Y, Wang E, Zhang R, Tao S, Yang D, Zhou R, Pei D. LogParse: making log parsing adaptive through word classification. In: Proceedings of the 29th International Conference on Computer Communications and Networks (ICCCN). 2020, 1–9

    Google Scholar 

  103. Vervaet A, Chiky R, Callau-Zori M. USTEP: unfixed search tree for efficient log parsing. In: Proceedings of 2021 IEEE International Conference on Data Mining (ICDM). 2021, 659–668

    Chapter  Google Scholar 

  104. Chakrabarti A, Striegel A, Manimaran G. A case for tree evolution in QoS multicasting. In: Proceedings of the 10th IEEE International Workshop on Quality of Service (Cat. No.02EX564). 2002, 116–125

    Google Scholar 

  105. Li K. A random-walk-based dynamic tree evolution algorithm with exponential speed of convergence to optimality on regular networks. In: Proceedings of the 4th International Conference on Frontier of Computer Science and Technology. 2009, 80–85

    Google Scholar 

  106. Tomer A, Schach S R. The evolution tree: a maintenance-oriented software development model. In: Proceedings of the 4th European Conference on Software Maintenance and Reengineering. 2000, 209–214

    Chapter  Google Scholar 

  107. Dai H, Li H, Chen C S, Shang W, Chen T H. Logram: efficient log parsing using n-gram dictionaries. IEEE Transactions on Software Engineering, 2022, 48(3): 879–892

    Google Scholar 

  108. Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 149–158

    Google Scholar 

  109. Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In: Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles. 2009, 117–132

    Chapter  Google Scholar 

  110. Nedelkoski S, Cardoso J, Kao O. Anomaly detection from system tracing data using multimodal deep learning. In: Proceedings of the 12th IEEE International Conference on Cloud Computing (CLOUD). 2019, 179–186

    Google Scholar 

  111. Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K. TadGAN: Time series anomaly detection using generative adversarial networks. In: Proceedings of 2020 IEEE International Conference on Big Data (Big Data). 2020, 33–43

    Google Scholar 

  112. Luo W, Wang P, Wang J, An W. The research process of generative adversarial networks. Journal of Physics: Conference Series, 2019, 1176(3): 032008

    Google Scholar 

  113. Tran N T, Tran V H, Nguyen N B, Nguyen T K, Cheung N M. On data augmentation for GAN training. IEEE Transactions on Image Processing, 2021, 30: 1882–1897

    Article  MathSciNet  Google Scholar 

  114. Liu Z, Sabar N, Song A. Improving evolutionary generative adversarial networks. In: Proceedings of the 34th Australasian Joint Conference on Artificial Intelligence. 2022, 691–702

    Google Scholar 

  115. Sinha R, Sankaran A, Vatsa M, Singh R. AuthorGAN: Improving GAN reproducibility using a modular GAN framework. 2019, arXiv preprint arXiv: 1911.13250

    Google Scholar 

  116. Xia W, Zhang Y, Yang Y, Xue J H, Zhou B, Yang M H. GAN inversion: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3121–3138

    Google Scholar 

  117. Wang X, Cao Q, Wang Q, Cao Z, Zhang X, Wang P. Robust log anomaly detection based on contrastive learning and multi-scale mass. The Journal of Supercomputing, 2022, 78(16): 17491–17512

    Article  Google Scholar 

  118. Zhang Z, Wu S, Jiang D, Chen G. BERT-JAM: boosting BERT-enhanced neural machine translation with joint attention. 2020, arXiv preprint arXiv: 2011.04266

    Google Scholar 

  119. Trang N T M, Shcherbakov M. Vietnamese question answering system from multilingual BERT models to monolingual BERT model. In: Proceedings of the 9th International Conference System Modeling and Advancement in Research Trends (SMART). 2020, 201–206

    Google Scholar 

  120. Shi L, Liu D, Liu G, Meng K. AUG-BERT: an efficient data augmentation algorithm for text classification. In: Proceedings of the 8th International Conference in Communications, Signal Processing, and Systems. 2020, 2191–2198

    Chapter  Google Scholar 

  121. Praechanya N, Sornil O. Improving Thai named entity recognition performance using BERT transformer on deep networks. In: Proceedings of the 6th International Conference on Machine Learning Technologies. 2021, 177–183

    Google Scholar 

  122. Yang L, Chen J, Wang Z, Wang W, Jiang J, Dong X, Zhang W. Semi-supervised log-based anomaly detection via probabilistic label estimation. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). 2021, 1448–1460

    Google Scholar 

  123. Farzad A, Gulliver T A. Log message anomaly detection with fuzzy c-means and MLP. Applied Intelligence, 2022, 52(15): 17708–17717

    Article  Google Scholar 

  124. Zhang C, Peng X, Sha C, Zhang K, Fu Z, Wu X, Lin Q, Zhang D. DeepTraLog: Trace-log combined microservice anomaly detection through graph-based deep learning. In: Proceedings of the 44th International Conference on Software Engineering. 2022, 623–634

    Chapter  Google Scholar 

  125. Zhang C, Peng X, Zhou T, Sha C, Yan Z, Chen Y, Yang H. TraceCRL: contrastive representation learning for microservice trace analysis. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022, 1221–1232

    Chapter  Google Scholar 

  126. Aguilera M K, Mogul J C, Wiener J L, Reynolds P, Muthitacharoen A. Performance debugging for distributed systems of black boxes. ACM SIGOPS Operating Systems Review, 2003, 37(5): 74–89

    Article  Google Scholar 

  127. Chen Y Y M. Path-Based Failure and Evolution Management. Berkeley: University of California at Berkeley, 2004

    Google Scholar 

  128. Barham P, Donnelly A, Isaacs R, Mortier R. Using magpie for request extraction and workload modelling. In: Proceedings of the 6th Symposium on Operating System Design & Implementation. 2004, 18

    Google Scholar 

  129. Chen H, Jiang G, Ungureanu C, Yoshihira K. Failure detection and localization in component based systems by online tracking. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 750–755

    Google Scholar 

  130. Lim M H, Lou J G, Zhang H, Fu Q, Teoh A B J, Lin Q, Ding R, Zhang D. Identifying recurrent and unknown performance issues. In: Proceedings of 2014 IEEE International Conference on Data Mining. 2014, 320–329

    Chapter  Google Scholar 

  131. Fischer A, Igel C. Training restricted Boltzmann machines: an introduction. Pattern Recognition, 2014, 47(1): 25–39

    Article  Google Scholar 

  132. Carreira-Perpinan M Á, Hinton G E. On contrastive divergence learning. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. 2005, 33–40

    Google Scholar 

  133. Liu P, Xu H, Ouyang Q, Jiao R, Chen Z, Zhang S, Yang J, Mo L, Zeng J, Xue W, Pei D. Unsupervised detection of microservice trace anomalies through service-level deep Bayesian networks. In: Proceedings of the 31st IEEE International Symposium on Software Reliability Engineering (ISSRE). 2020, 48–58

    Google Scholar 

  134. Kohyarnejadfard I, Aloise D, Dagenais M R, Shakeri M. A framework for detecting system performance anomalies using tracing data analysis. Entropy, 2021, 23(8): 1011

    Article  Google Scholar 

  135. Cai Y, Han B, Su J, Wang X. TraceModel: an automatic anomaly detection and root cause localization framework for microservice systems. In: Proceedings of the 17th International Conference on Mobility, Sensing and Networking (MSN). 2021, 512–519

    Google Scholar 

  136. Li M, Tang D, Wen Z, Cheng Y. Microservice anomaly detection based on tracing data using semi-supervised learning. In: Proceedings of the 4th International Conference on Artificial Intelligence and Big Data (ICAIBD). 2021, 38–44

    Google Scholar 

  137. Liu J, Hu Y, Wu B, Wang Y, Xie F. A hybrid generalized hidden markov model-based condition monitoring approach for rolling bearings. Sensors, 2017, 17(5): 1143

    Article  Google Scholar 

  138. Wang R, Ying S, Sun C, Wan H, Zhang H, Jia X. Model construction and data management of running log in supporting saas software performance analysis. In: Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering (SEKE 2017). 2017, 149–154

    Google Scholar 

  139. Fu X, Ren R, Zhan J, Zhou W, Jia Z, Lu G. LogMaster: mining event correlations in logs of large-scale cluster systems. In: Proceedings of the 31st IEEE Symposium on Reliable Distributed Systems. 2012, 71–80

    Google Scholar 

  140. Zou D, Qin H, Jin H, Qiang W, Han Z, Chen X. Improving log-based fault diagnosis by log classification. In: Proceedings of the 11th IFIP International Conference on Network and Parallel Computing. 2014, 446–458

    Google Scholar 

  141. Guo X, Peng X, Wang H, Li W, Jiang H, Ding D, Xie T, Su L. Graph-based trace analysis for microservice architecture understanding and problem diagnosis. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020, 1387–1397

    Chapter  Google Scholar 

  142. Huo Y, Dong J, Ge Z, Xie P, An N, Yang Y. IWApriori: an association rule mining and self-updating method based on weighted increment. In: Proceedings of the 21st Asia-Pacific Network Operations and Management Symposium (APNOMS). 2020, 167–172

    Google Scholar 

  143. Wang L, Zhao N, Chen J, Li P, Zhang W, Sui K. Root-cause metric location for microservice systems via log anomaly detection. In: Proceedings of 2020 IEEE International Conference on Web Services (ICWS). 2020, 142–150

    Chapter  Google Scholar 

  144. Liu D, He C, Peng X, Lin F, Zhang C, Gong S, Li Z, Ou J, Wu Z. MicroHECL: High-efficient root cause localization in large-scale microservice systems. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 2021, 338–347

    Google Scholar 

  145. Gan Y, Liang M, Dev S, Lo D, Delimitrou C. Sage: practical and scalable ML-driven performance debugging in microservices. In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2021, 135–151

    Chapter  Google Scholar 

  146. Ma M, Lin W, Pan D, Wang P. ServiceRank: root cause identification of anomaly in large-scale microservice architectures. IEEE Transactions on Dependable and Secure Computing, 2022, 19(5): 3087–3100

    Article  Google Scholar 

  147. Li M, Li Z, Yin K, Nie X, Zhang W, Sui K, Pei D. Causal inference-based root cause analysis for online service systems with intervention recognition. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 3230–3240

    Chapter  Google Scholar 

  148. Zhao X, Zhang Y, Lion D, Ullah M F, Luo Y, Yuan D, Stumm M. lprof: a non-intrusive request flow profiler for distributed systems. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 2014, 629–644

    Google Scholar 

  149. Bare K, Kavulya S P, Tan J, Pan X, Marinelli E, Kasick M, Gandhi R, Narasimhan P. ASDF: an automated, online framework for diagnosing performance problems. In: Casimiro A, Lemos R, Gacek C, eds. Architecting Dependable Systems VII. Berlin: Springer, 2010, 201–226

    Chapter  Google Scholar 

  150. Attariyan M, Chow M, Flinn J. X-ray: automating root-cause diagnosis of performance anomalies in production software. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 307–320

    Google Scholar 

  151. Malik H, Hemmati H, Hassan A E. Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 35th International Conference on Software Engineering (ICSE). 2013, 1012–1021

    Google Scholar 

  152. Tuncer O, Ates E, Zhang Y, Turk A, Brandt J, Leung V J, Egele M, Coskun A K. Online diagnosis of performance variation in hpc systems using machine learning. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(4): 883–896

    Article  Google Scholar 

  153. Li M, Tang D, Wen Z, Cheng Y. Universal anomaly detection method based on massive monitoring indicators of cloud platform. In: Proceedings of 2021 IEEE International Conference on Software Engineering and Artifcial Intelligence (SEAI). 2021, 23–29

    Google Scholar 

  154. Borghesi A, Molan M, Milano M, Bartolini A. Anomaly detection and anticipation in high performance computing systems. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(4): 739–750

    Article  Google Scholar 

  155. Stehman S V. Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 1997, 62(1): 77–89

    Article  Google Scholar 

  156. Powers D M W. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. 2020, arXiv preprint arXiv: 2010.16061

    Google Scholar 

  157. Hsieh W W. Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels. Cambridge: Cambridge University Press, 2009

    Book  Google Scholar 

  158. Qin J, He Z S. A SVM face recognition method based on Gabor-featured key points. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics. 2005, 5144–5149

    Chapter  Google Scholar 

  159. Hearst M A, Dumais S T, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Systems and Their Applications, 1998, 13(4): 18–28

    Article  Google Scholar 

  160. Rish I. An empirical study of the naive Bayes classifier. In: Proceedings of IJCAI 2001Workshop on Empirical Methods in Artificial Intelligence. 2001, 41–46

    Google Scholar 

  161. Larose D T, Larose C D. k-nearest neighbor algorithm. In: Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, 2014, 149–164

    Chapter  Google Scholar 

  162. Manning C, Raghavan P, Schütze H. Vector space classification. In: An Introduction to Information Retrieval. 2009, 289–317

    Google Scholar 

  163. Freedman D A. Statistical Models: Theory and Practice. Cambridge, England: Cambridge University Press, 2009

    Book  Google Scholar 

  164. Lam P, Wang L, Ngan H Y, Yung N H, Yeh A G. Outlier detection in large-scale traffic data by naive Bayes method and Gaussian mixture model method. 2015, arXiv preprint arXiv: 1512.08413

    Google Scholar 

  165. Loh W Y. Classification and regression trees. WIREs: Data Mining and Knowledge Discovery, 2011, 1(1): 14–23

    Google Scholar 

  166. Freund Y, Schapire R E. A desicion-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the 2nd European Conference on Computational Learning Theory. 1995, 23–37

    Chapter  Google Scholar 

  167. Phillips S J, Anderson R P, Schapire R E. Maximum entropy modeling of species geographic distributions. Ecological Modelling, 2006, 190(3–4): 231–259

    Article  Google Scholar 

Download references

Acknowledgments

The work was supported by the National Key R&D Program of China (2022YFB3304300), the Humanities and Social Sciences Youth Foundation, Ministry of Education (23YJCZH221) and the Natural Science Foundation of Shandong Province (ZR2023QE030).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shi Ying.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Rui Wang is currently a lecturer with the College of Energy and Mining Engineering, Shandong University of Science and Technology, China. She received the PhD degree from Wuhan University, China in 2019. Her research interests include software engineering and data mining.

Xiangbo Tian received the BE degree from the Shandong University of Science and Technology, China in 2018. He is currently working toward the PhD degree with the School of Computer Science, Wuhan University, China. His current research interests include service computing and microservices.

Shi Ying is currently a professor in the School of Computer Science, Wuhan University, China. His main research interests include software engineering and artificial intelligence.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, R., Tian, X. & Ying, S. Performance issue monitoring, identification and diagnosis of SaaS software: a survey. Front. Comput. Sci. 19, 191201 (2025). https://doi.org/10.1007/s11704-023-2701-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-023-2701-0

Keywords

Navigation