More Web Proxy on the site http://driver.im/

research-article

An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions

Authors:

Mohammed Sayagh,

Zhen Ming (Jack) Jiang,

Ahmed E. HassanAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 30, Issue 4

Article No.: 54, Pages 1 - 38

https://doi.org/10.1145/3447876

Published: 23 July 2021 Publication History

Abstract

AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help practitioners handle the massive data produced during the operations of large-scale systems. However, due to the nature of the operation data, AIOps modeling faces several data splitting-related challenges, such as imbalanced data, data leakage, and concept drift. In this work, we study the data leakage and concept drift challenges in the context of AIOps and evaluate the impact of different modeling decisions on such challenges. Specifically, we perform a case study on two commonly studied AIOps applications: (1) predicting job failures based on trace data from a large-scale cluster environment and (2) predicting disk failures based on disk monitoring data from a large-scale cloud storage environment. First, we observe that the data leakage issue exists in AIOps solutions. Using a time-based splitting of training and validation datasets can significantly reduce such data leakage, making it more appropriate than using a random splitting in the AIOps context. Second, we show that AIOps solutions suffer from concept drift. Periodically updating AIOps models can help mitigate the impact of such concept drift, while the performance benefit and the modeling cost of increasing the update frequency depend largely on the application data and the used models. Our findings encourage future studies and practices on developing AIOps solutions to pay attention to their data-splitting decisions to handle the data leakage and concept drift challenges.

References

[1]

Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapè. 2019. MIMETIC: Mobile encrypted traffic classification using multimodal deep learning. Comput. Netw. 165 (2019), 106

[2]

Amritanshu Agrawal and Tim Menzies. 2018. Is “Better data” better than “better data miners”?: On the benefits of tuning SMOTE for defect prediction. In Proceedings of the 40th International Conference on Software Engineering (ICSE’18). 1050–1061.

Digital Library

[3]

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’19). 291–300.

Digital Library

[4]

James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 10 (2012), 281–305.

Digital Library

[5]

Mirela Madalina Botezatu, Ioana Giurgiu, Jasmina Bogojeska, and Dorothea Wiesmann. 2016. Predicting disk replacement towards reliable data centers. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). 39–48.

Digital Library

[6]

Sabri Boughorbel, Fethi Jarray, and Mohammed El-Anbari. 2017. Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS One 12, 6 (2017), 1–17.

[7]

Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5–32.

Digital Library

[8]

Dariusz Brzezinski and Jerzy Stefanowski. 2014. Prequential AUC for classifier evaluation and drift detection in evolving data streams. In Proceedings of the 3rd International Conference on New Frontiers in Mining Complex Patterns (NFMCP’14). 87–101.

Digital Library

[9]

Dariusz Brzezinski and Jerzy Stefanowski. 2014. Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25, 1 (2014), 81–94.

[10]

Dariusz Brzezinski and Jerzy Stefanowski. 2017. Prequential AUC: Properties of the area under the ROC curve for data streams with concept drift. Knowl. Inf. Syst. 52 (2017), 531–562.

Digital Library

[11]

Alberto Cano and Bartosz Krawczyk. 2019. Evolving rule-based classifiers with genetic programming on GPUs for drifting data streams. Pattern Recog. 87 (2019), 248–268.

[12]

Alberto Cano and Bartosz Krawczyk. 2020. Kappa updated ensemble for drifting data stream mining. Mach. Learn. 109, 1 (2020), 175–218.

Digital Library

[13]

Nitesh V. Chawla, Kevin Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 (2002), 321–357.

Digital Library

[14]

Xin Chen, Charng-Da Lu, and Karthik Pattabiraman. 2014. Failure prediction of jobs in compute clouds: A Google cluster case study. In Proceedings of the IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW’14). 341–346.

Digital Library

[15]

Yong Chen, Ruping Pan, and Alexander Pfeifer. 2017. Regulation of brown and beige fat by MicroRNAs. Pharmacol. Therap. 170 (2017), 1–7.

[16]

Yujun Chen, Xian Yang, Qingwei Lin, Hongyu Zhang, Feng Gao, Zhangwei Xu, Yingnong Dang, Dongmei Zhang, Hang Dong, Yong Xu, Hao Li, and Yu Kang. 2019. Outage prediction and diagnosis for cloud service systems. In Proceedings of the World Wide Web Conference (WWW’19). 2659–2665.

Digital Library

[17]

Yingnong Dang, Qingwei Lin, and Peng Huang. 2019. AIOps: Real-World challenges and research innovations. In Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion’19). 4–5.

Digital Library

[18]

Rui Ding, Qiang Fu, Jian-Guang Lou, Qingwei Lin, Dongmei Zhang, Jiajun Shen, and Tao Xie. 2012. Healing online service systems via mining historical issue repositories. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE’12). 318–321.

Digital Library

[19]

Rui Ding, Qiang Fu, Jian Guang Lou, Qingwei Lin, Dongmei Zhang, and Tao Xie. 2014. Mining historical issue repositories to heal large-scale online service systems. In Proceedings of the 44th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’14). 311–322.

Digital Library

[20]

Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00). 71–80.

Digital Library

[21]

Priyanka B. Dongre and Latesh G. Malik. 2014. A review on real time data stream classification and adapting to various concept drift scenarios. In Proceedings of the IEEE International Advance Computing Conference (IACC’14). 533–537.

[22]

Jayalath Ekanayake, Jonas Tappolet, Harald C. Gall, and Abraham Bernstein. 2009. Tracking concept drift of software projects using defect prediction quality. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories (MSR’09). 51–60.

Digital Library

[23]

Jayalath Ekanayake, Jonas Tappolet, Harald C. Gall, and Abraham Bernstein. 2012. Time variance and defect prediction in software projects—Towards an exploitation of periods of stability and change as well as a notion of concept drift in software projects. Empir. Softw. Eng. 17, 4–5 (2012), 348–389.

Digital Library

[24]

Nosayba El-Sayed, Hongyu Zhu, and Bianca Schroeder. 2017. Learning from failure across multiple clusters: A trace-driven approach to understanding, predicting, and mitigating job terminations. In Proceedings of the 37th IEEE International Conference on Distributed Computing Systems (ICDCS’17). 1333–1344.

[25]

James D. Evans. 1996. Straightforward Statistics for the Behavioral Sciences. Brooks/Cole Publishing Co.

[26]

João Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues. 2004. Learning with drift detection. In Proceedings of the Advances in Artificial Intelligence Conference (SBIA’04). 286–295.

[27]

João Gama, Raquel Sebastião, and Pedro Rodrigues. 2013. On evaluating stream learning algorithms. Mach. Learn. 90 (2013), 317–346.

Digital Library

[28]

João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Comput. Surv. 46, 4 (2014).

Digital Library

[29]

Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proceedings of the 37th International Conference on Software Engineering (ICSE’15). 789–800.

Digital Library

[30]

Maayan Harel, Koby Crammer, Ran El-Yaniv, and Shie Mannor. 2014. Concept drift detection through resampling. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML’14). II–1009–II–1017.

Digital Library

[31]

Shilin He, Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, Michael R. Lyu, and Dongmei Zhang. 2018. Identifying impactful service system problems via log analysis. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18). 60–70.

Digital Library

[32]

T. Ryan Hoens, Robi Polikar, and Nitesh V. Chawla. 2012. Learning from streaming data with concept drift and imbalance: An overview. Prog. Artif. Intell. 1, 1 (2012), 89–101.

[33]

Geoff Hulten, Laurie Spencer, and Pedro Domingos. 2001. Mining time-changing data streams. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01). 97–106.

Digital Library

[34]

Backblaze Inc.2020. Backblaze Hard Drive Stats. Backblaze B2 Cloud Storage. Retrieved from https://www.backblaze.com/b2/hard-drive-test-data.html.

[35]

Arya Iranmehr, Hamed Masnadi-Shirazi, and Nuno Vasconcelos. 2019. Cost-sensitive support vector machines. Neurocomputing 343 (2019), 50–64.

Digital Library

[36]

Rie Johnson and Tong Zhang. 2013. Learning nonlinear functions using regularized greedy forest. IEEE Trans. Pattern Anal. Mach. Intell. 36, 5 (2013), 942–954.

[37]

Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E. Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21 (2016), 2072–2106.

Digital Library

[38]

Andrej Karpathy. 2019. “We see more significant improvements from training data distribution search (data splits + oversampling factor ratios) than neural architecture search. The latter is so overrated :)”. Retrieved from https://twitter.com/karpathy/status/1175138379198914560.

[39]

Shachar Kaufman, Saharon Rosset, and Claudia Perlich. 2011. Leakage in data mining: Formulation, detection, and avoidance. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), Vol. 6. 556–563.

Digital Library

[40]

Shachar Kaufman, Saharon Rosset, Claudia Perlich, and Ori Stitelman. 2012. Leakage in data mining: Formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data 6, 4 (2012), 1–21.

Digital Library

[41]

Foutse Khomh, Bram Adams, Jinghui Cheng, Marios Fokaefs, and Giuliano Antoniol. 2018. Software engineering for machine-learning applications: The road ahead. IEEE Softw. 35, 5 (2018), 81–84.

[42]

Ralf Klinkenberg and Thorsten Joachims. 2000. Detecting concept drift with support vector machines. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). 487–494.

Digital Library

[43]

Bartosz Krawczyk and Alberto Cano. 2018. Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl. Soft Comput. 68 (2018), 677–692.

Digital Library

[44]

Max Kuhn and Kjell Johnson. 2013. Applied Predictive Modeling. Vol. 26. Springer.

[45]

Mark Last. 2002. Online classification of nonstationary data streams. Intell. Data Anal. 6, 2 (2002), 129–147.

Digital Library

[46]

Heng Li, Tse-Hsun Peter Chen, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2018. Adopting autonomic computing capabilities in existing large-scale systems: An industrial experience report. In Proceedings of the 40th International Conference on Software Engineering (ICSE-SEIP’18). 1–10.

Digital Library

[47]

Heng Li, Weiyi Shang, and Ahmed E. Hassan. 2017. Which log level should developers choose for a new logging statement?Empir. Softw. Eng. 22, 4 (2017), 1684–1716.

Digital Library

[48]

Heng Li, Weiyi Shang, Ying Zou, and Ahmed E. Hassan. 2017. Towards just-in-time suggestions for log changes. Empir. Softw. Eng. 22, 4 (2017), 1831–1865.

Digital Library

[49]

Yangguang Li, Zhen Ming Jiang, Heng Li, Ahmed E. Hassan, Cheng He, Ruirui Huang, Zhengda Zeng, Mian Wang, and Pinan Chen. 2020. Predicting node failures in an ultra-large-scale cloud computing platform: An AIOps solution. ACM Trans. Softw. Eng. Methodol. 29, 2 (2020), 1–24.

Digital Library

[50]

Meng-Hui Lim, Jian-Guang Lou, Hongyu Zhang, Qiang Fu, Andrew Beng Jin Teoh, Qingwei Lin, Rui Ding, and Dongmei Zhang. 2014. Identifying recurrent and unknown performance issues. In Proceedings of the IEEE International Conference on Data Mining (ICDM’14). 320–329.

Digital Library

[51]

Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, et al. 2018. Predicting node failure in cloud service systems. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18). 480–490.

Digital Library

[52]

Jian-Guang Lou, Qingwei Lin, Rui Ding, Qiang Fu, Dongmei Zhang, and Tao Xie. 2013. Software analytics for incident management of online services: An experience report. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE’13). 475–485.

Digital Library

[53]

Jian-Guang Lou, Qingwei Lin, Rui Ding, Qiang Fu, Dongmei Zhang, and Tao Xie. 2017. experience report on applying software analytics in incident management of online service. Autom. Softw. Eng. 24, 4 (2017), 905–941.

Digital Library

[54]

Nicola Lunardon, Giovanna Menardi, and Nicola Torelli. 2014. ROSE: A package for binary imbalanced learning. R. Journal 6 (2014), 79–89.

[55]

Chen Luo, Jian-Guang Lou, Qingwei Lin, Qiang Fu, Rui Ding, Dongmei Zhang, and Zhe Wang. 2014. Correlating events with time series for incident diagnosis. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 1583–1592.

Digital Library

[56]

Guillermo Eduardo Macbeth, Eugenia Razumiejczyk, and Rubén Daniel Ledesma. 2011. Cliff’s delta calculator: A non-parametric effect size program for two groups of observations. Univers. Psychol. 10 (2011), 545–555.

[57]

Farzaneh Mahdisoltani, Ioan Stefanovici, and Bianca Schroeder. 2017. Proactive error prediction to improve storage system reliability. In Proceedings of the USENIX Annual Technical Conference (ATC’17). 391–402.

Digital Library

[58]

Shane Mcintosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2016. An empirical study of the impact of modern code review practices on software quality. Empir. Softw. Eng. 21, 5 (2016), 2146–2189.

Digital Library

[59]

Giovanna Menardi and Nicola Torelli. 2014. Training and assessing classification rules with imbalanced data. Data Mining Knowl. Discov. 28, 1 (2014), 92–122.

Digital Library

[60]

Leandro L. Minku, Allan P. White, and Xin Yao. 2009. The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng. 22, 5 (2009), 730–742.

Digital Library

[61]

Leandro L. Minku and Xin Yao. 2011. DDD: A new ensemble approach for dealing with concept drift. IEEE Trans. Knowl. Data Eng. 24, 4 (2011), 619–633.

Digital Library

[62]

Leon Moonen, Stefano Di Alesio, David Binkley, and Thomas Rolfsnes. 2016. Practical guidelines for change recommendation using association rule mining. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16). 732–743.

Digital Library

[63]

Andrew Ng. 2019. “The rise of Software Engineering required inventing processes like version control, code review, agile, to help teams work effectively. The rise of AI & Machine Learning Engineering is now requiring new processes, like how we split train/dev/test, model zoos, etc.”Retrieved from https://twitter.com/andrewyng/status/10808864393 80869122.

[64]

Kyosuke Nishida and Koichiro Yamauchi. 2007. Detecting concept drift using statistical testing. In Discovery Science. Springer, 264–269.

Digital Library

[65]

Claudia Perlich. 2014. Lessons learned from data competitions: Data leakage and model evaluation. In Doing Data Science: Straight Talk from the Frontline. O’Reilly Media, Inc., 303–320.

[66]

Teerat Pitakrat, André van Hoorn, and Lars Grunske. 2013. A comparison of machine learning algorithms for proactive hard disk drive failure detection. In Proceedings of the 4th International ACM Sigsoft Symposium on Architecting Critical Systems (ISARCS’13). 1–10.

Digital Library

[67]

Pankaj Prasad and Charley Rich. 2018. Market Guide for AIOps Platforms. Gartner Research. Retrieved from https://www.gartner.com/doc/3892967/market-guide-aiops-platforms.

[68]

Thomas Rolfsnes, Leon Moonen, and David Binkley. 2017. Predicting relevance of change recommendations. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). 694–705.

Digital Library

[69]

Jeanine Romano, Jeffrey D. Kromrey, Jesse Coraggio, and Jeff Skowronek. 2006. Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’d for evaluating group differences on the NSSE and other surveys. In Proceedings of the Meeting of the Florida Association of Institutional Research. 1–3.

[70]

Andrea Rosà, Lydia Y. Chen, and Walter Binder. 2015. Catching failures of failures at big-data clusters: A two-level neural network approach. In Proceedings of the IEEE 23rd International Symposium on Quality of Service (IWQoS’15). 231–236.

[71]

Andrea Rosà, Lydia Y. Chen, and Walter Binder. 2015. Predicting and mitigating jobs failures in big data clusters. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’15). 221–230.

Digital Library

[72]

Saharon Rosset, Claudia Perlich, Grzergorz Świrszcz, Prem Melville, and Yan Liu. 2010. Medical data mining: Insights from winning two competitions. Data Mining Knowl. Discov. 20, 3 (2010), 439–468.

Digital Library

[73]

Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, Gyuri Szarvas, Manasi Vartak, Samuel Madden, Hui Miao, Amol Deshpande, et al. 2018. On challenges in machine learning model management. IEEE Data Eng. Bull. 41, 4 (2018), 5–15.

[74]

Conlan Scientific. 2020. Avoiding Data Leakage in Machine Learning. Conlan Scientific. Retrieved from https://conlanscientific.com/posts/category/blog/post/avoiding-data-leakage-machine-learning/.

[75]

Andrew Jhon Scott and M. Knott. 1974. A cluster analysis method for grouping means in the analysis of variance. Biometrics 30, 3 (1974), 507–512.

[76]

Liyan Song, Leandro L. Minku, and Xin Yao. 2013. The impact of parameter tuning on software effort estimation using learning machines. In Proceedings of the 9th International Conference on Predictive Models in Software Engineering (PROMISE’13).

Digital Library

[77]

W. Nick Street and YongSeog Kim. 2001. A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01). New York, NY, 377–382.

Digital Library

[78]

Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online defect prediction for imbalanced data. In Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE’15), Vol. 2. 99–108.

Digital Library

[79]

Chakkrit Tantithamthavorn and Ahmed E. Hassan. 2018. An experience report on defect modelling in practice: Pitfalls and challenges. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’18). 286–295.

Digital Library

[80]

Chakkrit Tantithamthavorn, Ahmed E. Hassan, and Kenichi Matsumoto. 2020. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans. Softw. Eng. 46, 11 (2020), 1200–1219.

[81]

Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2016. Automated parameter optimization of classification techniques for defect prediction models. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16). 321–332.

Digital Library

[82]

Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2017. An empirical comparison of model validation techniques for defect prediction models. IEEE Trans. Softw. Eng. 43, 1 (2017), 1–18.

Digital Library

[83]

Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2018. The Impact of automated parameter optimization on defect prediction models. IEEE Trans. Softw. Eng. 45, 7 (2018), 683–711.

[84]

Alexey Tsymbal. 2004. The problem of concept drift: Definitions and related work. Comput. Sci. Depart., Trinity Coll. Dublin 106, 2 (2004), 58.

[85]

Haixun Wang, Wei Fan, Philip S. Yu, and Jiawei Han. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). 226–235.

Digital Library

[86]

Haixun Wang, Philip S. Yu, and Jiawei Han. 2010. Mining concept-drifting data streams. In Data Mining and Knowledge Discovery Handbook. Springer, 789–802.

[87]

Shuo Wang, Leandro L. Minku, Davide Ghezzi, Daniele Caltabiano, Peter Tino, and Xin Yao. 2013. Concept drift detection for online class imbalance learning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’13). 1–10.

[88]

John Wilkes. 2020. Google Cluster-Usage Traces V3. Technical Report. Google Inc.Retrieved from https://github.com/google/cluster-data/blob/master/ClusterData2019.md.

[89]

Yong Xu, Kaixin Sui, Randolph Yao, Hongyu Zhang, Qingwei Lin, Yingnong Dang, Peng Li, Keceng Jiang, Wenchi Zhang, Jian-Guang Lou, Murali Chintalapati, and Dongmei Zhang. 2018. Improving service availability of cloud systems by predicting disk error. In Proceedings of the USENIX Annual Technical Conference (ATC’15). 481–494.

Digital Library

[90]

Ji Xue, Robert Birke, Lydia Y. Chen, and Evgenia Smirni. 2016. Managing data center tickets: Prediction and active sizing. In Proceedings of the 46th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’16). 335–346.

[91]

Ji Xue, Robert Birke, Lydia Y. Chen, and Evgenia Smirni. 2018. Spatial-temporal prediction models for active ticket managing in data centers. IEEE Trans. Netw. Serv. Manag. 15, 1 (2018), 39–52.

[92]

Indrė Žliobaitė, Mykola Pechenizkiy, and João Gama. 2016. An overview of concept drift applications. In Big Data Analysis: New Algorithms for a New Society. Springer, 91–114.

Cited By

Lyu YLi HJiang ZHassan A(2024)On the Model Update Strategies for Supervised Learning in AIOps SolutionsACM Transactions on Software Engineering and Methodology10.1145/366459933:7(1-38)Online publication date: 26-Aug-2024
https://dl.acm.org/doi/10.1145/3664599
Malik STenorio BMoond VDahiya DVora RDbouk N(2024)Systematic review of machine learning models in predicting the risk of bleed/grade of esophageal varices in patients with liver cirrhosis: A comprehensive methodological analysisJournal of Gastroenterology and Hepatology10.1111/jgh.1664539:10(2043-2059)Online publication date: 17-Jun-2024
https://doi.org/10.1111/jgh.16645
Lu JJi YZhang T(2024)Social Network Security Data Analysis and Identification Model: Based on Deep Convolutional Neural Network Improvement Research2024 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)10.1109/IPEC61310.2024.00111(615-622)Online publication date: 12-Apr-2024
https://doi.org/10.1109/IPEC61310.2024.00111
Show More Cited By

Index Terms

An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Maintaining software
    2. Software verification and validation
      1. Operational analysis

Recommendations

On the Model Update Strategies for Supervised Learning in AIOps Solutions
AIOps (Artificial Intelligence for IT Operations) solutions leverage the massive data produced during the operation of large-scale systems and machine learning models to assist software engineers in their system operations. As operation data produced in ...
Be careful of when: an empirical study on time-related misuse of issue tracking data
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Issue tracking data have been used extensively to aid in predicting or recommending software development practices. Issue attributes typically change over time, but users may use data from a separate time of data collection rather than the time of their ...
A comprehensive analysis of concept drift locality in data streams
Abstract
Adapting to drifting data streams is a significant challenge in online learning. Concept drift must be detected for effective model adaptation to evolving data properties. Concept drift can impact the data distribution entirely or partially, ...
Highlights
- New taxonomy of concept drift by number of classes and magnitude of the drift.
- 2760 benchmarks to evaluate drift detectors under local and global drifts.
- Comprehensive study of concept drift locality for both binary and multi-class ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 30, Issue 4

Continuous Special Section: AI and SE

October 2021

613 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/3461694

Editor:
Mauro Pezzè
Università della Svizzera italiana and Università di Milano-Bicocca, Switzerland

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2021

Accepted: 01 January 2021

Revised: 01 December 2020

Received: 01 June 2020

Published in TOSEM Volume 30, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
724
Total Downloads

Downloads (Last 12 months)148
Downloads (Last 6 weeks)15

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lyu YLi HJiang ZHassan A(2024)On the Model Update Strategies for Supervised Learning in AIOps SolutionsACM Transactions on Software Engineering and Methodology10.1145/366459933:7(1-38)Online publication date: 26-Aug-2024
https://dl.acm.org/doi/10.1145/3664599
Malik STenorio BMoond VDahiya DVora RDbouk N(2024)Systematic review of machine learning models in predicting the risk of bleed/grade of esophageal varices in patients with liver cirrhosis: A comprehensive methodological analysisJournal of Gastroenterology and Hepatology10.1111/jgh.1664539:10(2043-2059)Online publication date: 17-Jun-2024
https://doi.org/10.1111/jgh.16645
Lu JJi YZhang T(2024)Social Network Security Data Analysis and Identification Model: Based on Deep Convolutional Neural Network Improvement Research2024 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)10.1109/IPEC61310.2024.00111(615-622)Online publication date: 12-Apr-2024
https://doi.org/10.1109/IPEC61310.2024.00111
Wang C(2024)Social media platform-oriented topic mining and information security analysis by big data and deep convolutional neural networkTechnological Forecasting and Social Change10.1016/j.techfore.2023.123070199(123070)Online publication date: Feb-2024
https://doi.org/10.1016/j.techfore.2023.123070
Yusuf WAlaka HAhmad MGodoyon WAjayi SToriola-Coker LAhmed A(2024)Deep learning for automated encrustation detection in sewer inspectionIntelligent Systems with Applications10.1016/j.iswa.2024.20043324(200433)Online publication date: Dec-2024
https://doi.org/10.1016/j.iswa.2024.200433
Ilić DGignac G(2024)Evidence of interrelated cognitive-like capabilities in large language models: Indications of artificial general intelligence or achievement?Intelligence10.1016/j.intell.2024.101858106(101858)Online publication date: Sep-2024
https://doi.org/10.1016/j.intell.2024.101858
Gignac GSzodorai E(2024)Defining intelligence: Bridging the gap between human and artificial perspectivesIntelligence10.1016/j.intell.2024.101832104(101832)Online publication date: May-2024
https://doi.org/10.1016/j.intell.2024.101832
Chandrakumar CTan MHolden CStephens MPunchihewa APrasanna R(2024)Estimating S-wave amplitude for earthquake early warning in New Zealand: Leveraging the first 3 seconds of P-WaveEarth Science Informatics10.1007/s12145-024-01403-617:5(4527-4554)Online publication date: 13-Jul-2024
https://doi.org/10.1007/s12145-024-01403-6
Ouatiti YSayagh MKerzazi NAdams BHassan A(2024)The impact of concept drift and data leakage on log level prediction modelsEmpirical Software Engineering10.1007/s10664-024-10518-929:5Online publication date: 25-Jul-2024
https://doi.org/10.1007/s10664-024-10518-9
Ghadesi ALamothe MLi H(2024)What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack OverflowEmpirical Software Engineering10.1007/s10664-024-10499-929:5Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1007/s10664-024-10499-9
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents