[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Platform-Agnostic Framework for Automatically Identifying Performance Issue Reports With Heuristic Linguistic Patterns

Published: 17 April 2024 Publication History

Abstract

Software performance is critical for system efficiency, with performance issues potentially resulting in budget overruns, project delays, and market losses. Such problems are reported to developers through issue tracking systems, which are often under-tagged, as the manual tagging process is voluntary and time-consuming. Existing automated performance issue tagging techniques, such as keyword matching and machine/deep learning models, struggle due to imbalanced datasets and a high degree of variance. This paper presents a novel hybrid classification approach, combining Heuristic Linguistic Patterns (<italic>HLP</italic>s) with machine/deep learning models to enable practitioners to automatically identify performance-related issues. The proposed approach works across three progressive levels: <italic>HLP</italic> tagging, sentence tagging, and issue tagging, with a focus on linguistic analysis of issue descriptions. The authors evaluate the approach on three different datasets collected from different projects and issue-tracking platforms to prove that the proposed framework is accurate, project- and platform-agnostic, and robust to imbalanced datasets. Furthermore, this study also examined how the two unique techniques of the framework, including the fuzzy <italic>HLP</italic> matching and the <italic>Issue HLP Matrix</italic>, contribute to the accuracy. Finally, the study explored the effectiveness and impact of two off-the-shelf feature selection techniques, <italic>Boruta</italic> and <italic>RFE</italic>, with the proposed framework. The results showed that the proposed framework has great potential for practitioners to accurately (with up to 100% precision, 66% recall, and 79% <italic>F1</italic>-score) identify performance issues, with robustness to imbalanced data and good transferability to new projects and issue tracking platforms.

References

[1]
C. U. Smith and L. G. Williams, Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software, vol. 1. Reading, MA, USA: Addison-Wesley, 2002.
[2]
S. Balsamo, A. Marco, P. Inverardi, and M. Simeoni, “Model-based performance prediction in software development: A survey,” IEEE Trans. Softw. Eng., vol. 30, no. 5, pp. 295–310, May 2004.
[3]
G. Xu and A. Rountev, “Precise memory leak detection for Java software using container profiling,” in Proc. 30th Int. Conf. Softw. Eng. (ICSE), Leipzig, Germany, May 2008, pp. 151–160.
[4]
S. Zaman, B. Adams, and A. E. Hassan, “A qualitative study on performance bugs,” in Proc. 9th IEEE Work. Conf. Mining Softw. Repositories (MSR), Piscataway, NJ, USA: IEEE Press, 2012, pp. 199–208.
[5]
G. Xu, D. Yan, and A. Rountev, “Static detection of loop-invariant data structures,” in Proc. Eur. Conf. Object Oriented Program., Springer-Verlag, 2012, pp. 738–763.
[6]
D.-G. Lee and Y.-S. Seo, “Improving bug report triage performance using artificial intelligence based document generation model,” Human Centric Comput. Inf. Sci., vol. 10, no. 1, pp. 26–47, 2020.
[7]
D.-G. Lee and Y.-S. Seo, “Systematic review of bug report processing techniques to improve software management performance,” J. Inf. Process. Syst., vol. 15, no. 4, pp. 967–985, 2019.
[8]
A. Nistor, T. Jiang, and L. Tan, “Discovering, reporting, and fixing performance bugs,” in Proc. 10th Work. Conf. Mining Softw. Repositories, Piscataway, NJ, USA: IEEE Press, 2013, pp. 237–246.
[9]
A. B. Bondi, Foundations of Software and System Performance Engineering: Process, Performance Modeling, Requirements, Testing, Scalability, and Practice. Upper Saddle River, NJ, USA: Pearson Education, 2015.
[10]
Y. Zhao et al., “Automatically identifying performance issue reports with heuristic linguistic patterns,” in Proc. 28th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., 2020, pp. 964–975.
[11]
A. R. Silva, “Linguistic patterns and linguistic styles for requirements specification (I) an application case with the rigorous RSL/business-level language,” in Proc. 22nd Eur. Conf. Pattern Lang. Programs, 2017, pp. 1–27.
[12]
H. Ishibuchi, T. Nakashima, and T. Murata, “Three-objective genetics-based machine learning for linguistic rule extraction,” Inf. Sci., vol. 136, no. 1-4, pp. 109–133, 2001.
[13]
P. Chikersal, S. Poria, E. Cambria, A. Gelbukh, and C. E. Siong, “Modelling public sentiment in Twitter: Using linguistic patterns to enhance supervised learning,” in Proc. Int. Conf. Intell. Text Process. Comput. Linguistics, Springer-Verlag, 2015, pp. 49–65.
[14]
L. Müter, T. Deoskar, M. Mathijssen, S. Brinkkemper, and F. Dalpiaz, “Refinement of user stories into backlog items: Linguistic structure and action verbs,” in Proc. Int. Work. Conf. Requirements Eng., Found. Softw. Qual., Springer-Verlag, 2019, pp. 109–116.
[15]
L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, no. 3, pp. 338–353, 1965.
[16]
L. A. Zadeh, “The concept of a linguistic variable and its application to approximate reasoning—II,” Inf. Sci., vol. 8, no. 4, pp. 301–357, 1975.
[17]
M. J. Gacto, R. Alcalá, and F. Herrera, “Interpretability of linguistic fuzzy rule-based systems: An overview of interpretability measures,” Inf. Sci., vol. 181, no. 20, pp. 4340–4360, 2011.
[18]
L. Shi, C. Chen, Q. Wang, S. Li, and B. Boehm, “Understanding feature requests by leveraging fuzzy method and linguistic analysis,” in Proc. 32nd IEEE/ACM Int. Conf. Automated Softw. Eng., Piscataway, NJ, USA: IEEE Press, 2017, pp. 440–450.
[19]
E. H. Mamdani, “Application of fuzzy algorithms for control of simple dynamic plant,” in Proc. Inst. Elect. Eng., vol. 121, IET, 1974, pp. 1585–1588.
[20]
E. Mamdani and S. Assilian, “An experiment in linguistic synthesis with a fuzzy logic controller,” Int. J. Human Comput. Stud., vol. 51, no. 2, pp. 135–147, 1999.
[21]
F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1–47, 2002.
[22]
J. Martineau, T. Finin, A. Joshi, and S. Patel, “Improving binary classification on text problems using differential word features,” in Proc. 18th ACM Conf. Inf. Knowl. Manage., 2009, pp. 2019–2024.
[23]
C. Rameshbhai and J. Paulose, “Opinion mining on newspaper headlines using SVM and NLP,” Int. J. Elect. Comput. Eng., vol. 9, no. 3, pp. 2152–2163, 2019.
[24]
H. C. Wu, R. W. P. Luk, K. F. Wong, and K. L. Kwok, “Interpreting TF-IDF term weights as making relevance decisions,” ACM Trans. Inf. Syst., vol. 26, no. 3, pp. 1–37, 2008.
[25]
D. Kim, D. Seo, S. Cho, and P. Kang, “Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec,” Inf. Sci., vol. 477, pp. 15–29, Mar. 2019.
[26]
M. Li, C. Ling, and J. Gao, “An efficient CNN-based classification on G-protein coupled receptors using TF-IDF and N-gram,” in Proc. IEEE Symp. Comput. Commun. (ISCC), Piscataway, NJ, USA: IEEE Press, 2017, pp. 924–931.
[27]
L. G.-G. Judit Acs and T. B. R. Rezende Oliveira, “A two-level classifier for discriminating similar languages,” in Proc. Joint Workshop Lang. Technol. Closely Related Lang., Varieties Dialects, 2015, pp. 73–77.
[28]
O. Levy and Y. Goldberg, “Dependency-based word embeddings,” in Proc. 52nd Annu. Meeting Assoc. Comput. Linguistics (Volume 2: Short Papers), 2014, pp. 302–308.
[29]
J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2014, pp. 1532–1543.
[30]
B. Athiwaratkun, A. G. Wilson, and A. Anandkumar, “Probabilistic FastText for multi-sense word embeddings,” 2018,.
[31]
Y. Goldberg and O. Levy, “word2vec explained: Deriving mikolov et al.'s negative-sampling word-embedding method,” 2014,.
[32]
J. Chen, H. Huang, S. Tian, and Y. Qu, “Feature selection for text classification with Naïve bayes,” Expert Syst. Appl., vol. 36, no. 3, pp. 5432–5435, 2009.
[33]
A. McCallum and K. Nigam, “A comparison of event models for Naive Bayes text classification,” in Proc. AAAI-98 Workshop Learn. Text Categorization, 1998, vol. 752, pp. 41–48.
[34]
F. Peng and D. Schuurmans, “Combining Naive Bayes and n-gram language models for text classification,” in Proc. Eur. Conf. Inf. Retrieval, Springer-Verlag, 2003, pp. 335–350.
[35]
D. W. Hosmer, Jr, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression, vol. 398. Hoboken, NJ, USA: Wiley, 2013.
[36]
A. Sun, E.-P. Lim, and Y. Liu, “On strategies for imbalanced text classification using SVM: A comparative study,” Decis. Support Syst., vol. 48, no. 1, pp. 191–201, 2009.
[37]
F. Colas and P. Brazdil, “Comparison of SVM and some older classification algorithms in text classification tasks,” in Int. Conf. Artif. Intell. Theory Pract. (IFIP), Springer-Verlag, 2006, pp. 169–178.
[38]
S. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE Trans. Syst., Man, Cybern., vol. 21, no. 3, pp. 660–674, May/Jun. 1991.
[39]
M. Pal, “Random forest classifier for remote sensing classification,” Int. J. Remote Sens., vol. 26, no. 1, pp. 217–222, 2005.
[40]
T. Chen, T. He, M. Benesty, V. Khotilovich, and Y. Tang, “Xgboost: Extreme gradient boosting,” 2015.
[41]
A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5998–6008.
[42]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” 2018,.
[43]
S. Ruder, M. E. Peters, S. Swayamdipta, and T. Wolf, “Transfer learning in natural language processing,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Tut., 2019, pp. 15–18.
[44]
J. Villena-Román, S. Collada-Pérez, S. Lana-Serrano, and J. C. González-Cristóbal, “Hybrid approach combining machine learning and a rule-based expert system for text categorization,” in Proc. 24th Int. Florida Artif. Intell. Res. Soc. Conf. (FLAIRS), 2011.
[45]
R. Tong, “An operational system for detecting and tracking opinions in on-line discussions,” in Proc. Work. Notes SIGIR Workshop Oper. Text Classification, 2001, pp. 1–6.
[46]
G. Jin, L. Song, X. Shi, J. Scherpelz, and S. Lu, “Understanding and detecting real-world performance bugs,” ACM SIGPLAN Notices, vol. 47, no. 6, pp. 77–88, 2012.
[47]
L. Jiang and Z. Su, “Context-aware statistical debugging: From bug predictors to faulty control flow paths,” in Proc. 22nd IEEE/ACM Int. Conf. Automated Softw. Eng., 2007, pp. 184–193.
[48]
S. Han, Y. Dang, S. Ge, D. Zhang, and T. Xie, “Performance debugging in the large via mining millions of stack traces,” in Proc. 34th Int. Conf. Softw. Eng. (ICSE), Piscataway, NJ, USA: IEEE Press, 2012, pp. 145–155.
[49]
F. Pecorelli, D. Nucci, C. Roover, and A. Lucia, “A large empirical assessment of the role of data balancing in machine-learning-based code smell detection,” J. Syst. Softw., vol. 169, 2020, Art no. 110693.
[50]
F. Pecorelli, D. Nucci, C. Roover, and A. Lucia, “On the role of data balancing for machine learning-based code smell detection,” in Proc. 3rd ACM SIGSOFT Int. Workshop Mach. Learn. Techn. Softw. Qual. Eval., 2019, pp. 19–24.
[51]
G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explorations Newslett., vol. 6, no. 1, pp. 20–29, 2004.
[52]
D. M. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,” 2020,.
[53]
M. R. Hess and J. D. Kromrey, “Robust confidence intervals for effect sizes: A comparative study of Cohen’sd and Cliff's delta under non-normality and heterogeneous variances,” in Proc. Annu. Meeting Amer. Educ. Res. Assoc., vol. 1, 2004.
[54]
J. S. Burma et al., “Does task complexity impact the neurovascular coupling response similarly between males and females?” Physiol. Rep. vol. 9, no. 17, pp. 1–18, 2021.
[55]
R. A. Fisher, “Statistical methods for research workers,” in Proc. Breakthroughs Statist., Methodol. Distribution, Springer-Verlag, 1970, pp. 66–70.
[56]
A. B. Arrieta et al., “Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI,” Inf. Fusion, vol. 58, pp. 82–115, Jun. 2020.
[57]
W. Samek, G. Montavon, S. Lapuschkin, C. J. Anders, and K.-R. Müller, “Explaining deep neural networks and beyond: A review of methods and applications,” Proc. IEEE, vol. 109, no. 3, pp. 247–278, Mar. 2021.
[58]
G. Antoniol, K. Ayari, M. Penta, F. Khomh, and Y.-G. Guéhéneuc, “Is it a bug or an enhancement? A text-based approach to classify change requests,” in Proc. Conf. Center Adv. Stud. Collaborative Res., Meeting Minds, 2008, pp. 304–318.
[59]
N. Pandey, D. K. Sanyal, A. Hudait, and A. Sen, “Automated classification of software issue reports using machine learning techniques: An empirical study,” Innovations Syst. Softw. Eng., vol. 13, no. 4, pp. 279–297, 2017.
[60]
Y. Kashiwa, H. Yoshiyuki, Y. Kukita, and M. Ohira, “A pilot study of diversity in high impact bugs,” in Proc. IEEE Int. Conf. Softw. Maintenance Evolution, Piscataway, NJ, USA: IEEE Press, 2014, pp. 536–540.
[61]
M. Ohira et al., “A dataset of high impact bugs: Manually-classified issue reports,” in Proc. IEEE/ACM 12th Work. Conf. Mining Softw. Repositories, Piscataway, NJ, USA: IEEE Press, 2015, pp. 518–521.
[62]
N. Limsettho, H. Hata, A. Monden, and K. Matsumoto, “Automatic unsupervised bug report categorization,” in Proc. 6th Int. Workshop Empirical Softw. Eng. Pract., Piscataway, NJ, USA: IEEE Press, 2014, pp. 7–12.
[63]
K. Aggarwal, F. Timbers, T. Rutgers, A. Hindle, E. Stroulia, and R. Greiner, “Detecting duplicate bug reports with software engineering domain knowledge,” J. Softw., Evol. Process, vol. 29, no. 3, Oct. 2017.
[64]
A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, “Predicting the severity of a reported bug,” in Proc. 7th IEEE Work. Conf. Mining Softw. Repositories (MSR), Piscataway, NJ, USA: IEEE Press, 2010, pp. 1–10.
[65]
Y. Tian, D. Lo, X. Xia, and C. Sun, “Automated prediction of bug report priority using multi-factor analysis,” Empirical Softw. Eng., vol. 20, no. 5, pp. 1354–1383, 2015.
[66]
H. Zhang, L. Gong, and S. Versteeg, “Predicting bug-fixing time: An empirical study of commercial software projects,” in Proc. 35th Int. Conf. Softw. Eng. (ICSE), Piscataway, NJ, USA: IEEE Press, 2013, pp. 1042–1051.
[67]
S. Baltes, O. Moseler, F. Beck, and S. Diehl, “Navigate, understand, communicate: How developers locate performance bugs,” in Proc. ACM/IEEE Int. Symp. Empirical Softw. Eng. Meas. (ESEM), Piscataway, NJ, USA: IEEE Press, 2015, pp. 1–10.
[68]
Y. Liu, C. Xu, and S.-C. Cheung, “Characterizing and detecting performance bugs for smartphone applications,” in Proc. 36th Int. Conf. Softw. Eng., New York, NY, USA: ACM, 2014, pp. 1013–1024.
[69]
M. Selakovic and M. Pradel, “Performance issues and optimizations in JavaScript: An empirical study,” in Proc. 38th Int. Conf. Softw. Eng., New York, NY, USA: ACM, 2016, pp. 61–72.
[70]
L. Song and S. Lu, “Statistical debugging for real-world performance problems,” in Proc. ACM SIGPLAN Notices, vol. 49, pp. 561–578, New York, NY, USA: ACM, 2014.
[71]
Y. Zhao, L. Xiao, W. Xiao, B. Chen, and Y. Liu, “Localized or architectural: An empirical study of performance issues dichotomy,” in Proc. IEEE/ACM 41st Int. Conf. Softw. Eng., Companion Proc. (ICSE-Companion), Piscataway, NJ, USA: IEEE Press, 2019, pp. 316–317.
[72]
M. Pradel, M. Huggler, and T. R. Gross, “Performance regression testing of concurrent classes,” in Proc. Int. Symp. Softw. Testing Anal., New York, NY, USA: ACM, 2014, pp. 13–25.
[73]
Z. Chen et al., “Speedoo: Prioritizing performance optimization opportunities,” in Proc. 40th Int. Conf. Softw. Eng., New York, NY, USA: ACM, 2018, pp. 811–821.
[74]
A. Nistor, P.-C. Chang, C. Radoi, and S. Lu, “CARAMEL: Detecting and fixing performance problems that have non-intrusive fixes,” in Proc. IEEE/ACM 37th Int. Conf. Softw. Eng. (ICSE), vol. 1, Piscataway, NJ, USA: IEEE Press, 2015, pp. 902–912.
[75]
T. Yu et al., “Identification and characterization of a new alkaline SGNH hydrolase from a thermophilic bacterium Bacillus sp,” k91. J. Microbiol. Biotechnol., vol. 26, no. 4, pp. 730–738, 2016.
[76]
Y. Liu, Y. Zeng, and X. Piao, “High-responsive scheduling with mapreduce performance prediction on hadoop YARN,” in Proc. IEEE 22nd Int. Conf. Embedded Real-Time Comput. Syst. Appl. (RTCSA), Piscataway, NJ, USA: IEEE Press, 2016, pp. 238–247.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering  Volume 50, Issue 7
July 2024
309 pages

Publisher

IEEE Press

Publication History

Published: 17 April 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media