[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks

Published: 27 June 2024 Publication History

Abstract

Defect prediction is crucial for software quality assurance and has been extensively researched over recent decades. However, prior studies rarely focus on data complexity in defect prediction tasks, and even less on understanding the difficulties of these tasks from the perspective of data complexity. In this article, we conduct an empirical study to estimate the hardness of over 33,000 instances, employing a set of measures to characterize the inherent difficulty of instances and the characteristics of defect datasets. Our findings indicate that: (1) instance hardness in both classes displays a right-skewed distribution, with the defective class exhibiting a more scattered distribution; (2) class overlap is the primary factor influencing instance hardness and can be characterized through feature, structural, and instance-level overlap; (3) no universal preprocessing technique is applicable to all datasets, and it may not consistently reduce data complexity, fortunately, dataset complexity measures can help identify suitable techniques for specific datasets; (4) integrating data complexity information into the learning process can enhance an algorithm’s learning capacity. In summary, this empirical study highlights the crucial role of data complexity in defect prediction tasks, and provides a novel perspective for advancing research in defect prediction techniques.

References

[1]
Amritanshu Agrawal and Tim Menzies. 2018. Is “better data” better than “better data miners”? on the benefits of tuning smote for defect prediction. In Proceedings of the 40th International Conference on Software Engineering. 1050–1061.
[2]
Núria Macià Antolínez. 2011. Data Complexity in Supervised Learning: A Far-Reaching Implication. Ph. D. Dissertation. Universitat Ramon Llull.
[3]
José L. M. Arruda, Ricardo B. C. Prudêncio, and Ana C. Lorena. 2020. Measuring instance hardness using data complexity measures. In Proceedings of the 9th Brazilian Conference on Intelligent Systems. 483–497.
[4]
Gustavo EAPA Batista, Ana L. C. Bazzan, and Maria Carolina Monard. 2003. Balancing training data for automated annotation of keywords: A case study. In Proceedings of the WOB. 10–18.
[5]
Gustavo EAPA Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6, 1 (2004), 20–29.
[6]
Kwabena Ebo Bennin, Jacky Keung, Passakorn Phannachitta, Akito Monden, and Solomon Mensah. 2017. Mahakil: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Transactions on Software Engineering 44, 6 (2017), 534–550.
[7]
James Bergstra, Daniel Yamins, and David Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on International Conference on Machine Learning. 115–123.
[8]
Alex Berson, Stephen Smith, and Kurt Thearling. 2004. An overview of data mining techniques. Building Data Mining Application for CRM (2004).
[9]
Alceu S. Britto Jr, Robert Sabourin, and Luiz ES Oliveira. 2014. Dynamic selection of classifiers–a comprehensive review. Pattern Recognition 47, 11 (2014), 3665–3680.
[10]
André L. Brun, Alceu S. Britto Jr, Luiz S. Oliveira, Fabricio Enembreck, and Robert Sabourin. 2018. A framework for dynamic classifier selection oriented by the classification problem difficulty. Pattern Recognition 76 (2018), 175–190.
[11]
Edgar Brunner, Ullrich Munzel, and Madan L. Puri. 2002. The multivariate nonparametric behrens–fisheri problem. Journal of Statistical Planning and Inference 108, 1-2 (2002), 37–53.
[12]
Cagatay Catal and Banu Diri. 2009. A systematic review of software fault prediction studies. Expert Systems with Applications 36, 4 (2009), 7346–7354.
[13]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.
[14]
Haowen Chen, Xiao-Yuan Jing, Zhiqiang Li, Di Wu, Yi Peng, and Zhiguo Huang. 2020. An empirical study on heterogeneous defect prediction approaches. IEEE Transactions on Software Engineering 47, 12 (2020), 2803–2822.
[15]
Halimu Chongomweru and Asem Kasem. 2021. A novel ensemble method for classification in imbalanced datasets using split balancing technique based on instance hardness (sBal_IH). Neural Computing and Applications 33, 17 (2021), 11233–11254.
[16]
Marco D’Ambros, Michele Lanza, and Romain Robbes. 2010. An extensive comparison of bug prediction approaches. In Proceedings of the 7th IEEE Working Conference on Mining Software Repositories. 31–41.
[17]
James D. Evans. 1996. Straightforward Statistics for the Behavioral Sciences.Thomson Brooks/Cole Publishing Co.
[18]
Wei Fu, Tim Menzies, and Xipeng Shen. 2016. Tuning for software analytics: Is it really necessary? Information and Software Technology 76 (2016), 135–146.
[19]
Luis P. F. Garcia, Adriano Rivolli, Edesio Alcoba, Ana C. Lorena, and André CPLF de Carvalho. 2020. Boosting meta-learning with simulated data complexity measures. Intelligent Data Analysis 24, 5 (2020), 1011–1028.
[20]
Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proceedings of the 37th International Conference on Software Engineering. 789–800.
[21]
Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2017. A large-scale study of the impact of feature selection techniques on defect classification models. In Proceedings of the IEEE/ACM 14th International Conference on Mining Software Repositories. 146–157.
[22]
Lina Gong, Shujuan Jiang, Lili Bo, Li Jiang, and Junyan Qian. 2019. A novel class-imbalance learning approach for both within-project and cross-project defect prediction. IEEE Transactions on Reliability 69, 1 (2019), 40–54.
[23]
Lina Gong, Shujuan Jiang, Rongcun Wang, and Li Jiang. 2020. Empirical evaluation of the impact of class overlap on software defect prediction. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering. 698–709.
[24]
Lina Gong, Haoxiang Zhang, Jingxuan Zhang, Mingqiang Wei, and Zhiqiu Huang. 2022. A comprehensive investigation of the impact of class overlap on software defect prediction. IEEE Transactions on Software Engineering 49, 4 (2022), 2440–2458.
[25]
Arnulf B. A. Graf and Silvio Borer. 2001. Normalization in support vector machines. In Proceedings of the 23rd DAGM-Symposium on Pattern Recognition. 277–282.
[26]
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning. 1321–1330.
[27]
Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2012. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering 38, 6 (2012), 1276–1304.
[28]
Ahmed E. Hassan. 2009. Predicting faults using the complexity of code changes. In Proceedings of the 31st International Conference on Software Engineering. 78–88.
[29]
Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21, 9 (2009), 1263–1284.
[30]
Steffen Herbold, Alexander Trautsch, and Jens Grabowski. 2018. A comparative study to benchmark cross-project defect prediction approaches. In Proceedings of the 40th International Conference on Software Engineering. 1063–1063.
[31]
Tin Kam Ho and M. Basu. 2002. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 3 (2002), 289–300.
[32]
Marian Jureczko and Lech Madeyski. 2010. Towards identifying software project clusters with regard to defect prediction. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering. 1–10.
[33]
Ahmedul Kabir, Carolina Ruiz, and Sergio A. Alvarez. 2018. Mixed bagging: A novel ensemble learning framework for supervised classification based on instance hardness. In Proceedings of the IEEE International Conference on Data Mining. 1073–1078.
[34]
Yasutaka Kamei and Emad Shihab. 2016. Defect Prediction: Accomplishments and future challenges. In Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution and Reengineering. 33–45.
[35]
Sunghun Kim, E. James Whitehead Jr, and Yi Zhang. 2008. Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering 34, 2 (2008), 181–196.
[36]
Barbara Kitchenham, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust statistical methods for empirical software engineering. Empirical Software Engineering 22 (2017), 579–630.
[37]
Joanna Komorniczak and Paweł Ksieniewicz. 2023. problexity–An open-source Python library for supervised learning problem complexity assessment. Neurocomputing 521 (2023), 126–136.
[38]
Sotiris B. Kotsiantis, Ioannis Zaharakis, and P. Pintelas. 2007. Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering 160, 1 (2007), 3–24.
[39]
Volodymyr Kuleshov and Percy Liang. 2015. Calibrated structured prediction. In Proceedings of the 28th Conference on Neural Information Processing Systems. 3474–3482.
[40]
Jun Won Lee and Christophe Giraud-Carrier. 2011. A metric for unsupervised metalearning. Intelligent Data Analysis 15, 6 (2011), 827–841.
[41]
Guillaume Lemaître, Fernando Nogueira, and Christos K. Aridas. 2017. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research 18, 17 (2017), 1–5.
[42]
Enrique Leyva, Antonio González, and Raul Perez. 2014. A set of complexity measures designed for applying meta-learning to instance selection. IEEE Transactions on Knowledge and Data Engineering 27, 2 (2014), 354–367.
[43]
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM Computing Surveys 50, 6 (2017), 1–45.
[44]
Ke Li, Zilin Xiang, Tao Chen, Shuo Wang, and Kay Chen Tan. 2020. Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: An empirical study. In Proceedings of the 42nd International Conference on Software Engineering. 566–577.
[45]
Ling Li and Yaser S. Abu-Mostafa. 2006. Data Complexity in Machine Learning. Technical Report CaltechCSTR:2006.004. California Institute of Technology.
[46]
Zhiqiang Li, Xiao-Yuan Jing, and Xiaoke Zhu. 2018. Progress on approaches to software defect prediction. IET Software 12, 3 (2018), 161–175.
[47]
Zhining Liu, Wei Cao, Zhifeng Gao, Jiang Bian, Hechang Chen, Yi Chang, and Tieyan Liu. 2020. Self-paced ensemble for highly imbalanced massive data classification. In Proceedings of the 36th IEEE International Conference on Data Engineering. 841–852.
[48]
Zhining Liu, Pengfei Wei, Jing Jiang, Wei Cao, Jiang Bian, and Yi Chang. 2020. MESA: Boost ensemble imbalanced learning with meta-sampler. In Proceedings of the 34th Conference on Neural Information Processing Systems. 14463–14474.
[49]
Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin Kam Ho. 2019. How complex is your classification problem? a survey on measuring classification complexity. ACM Computing Surveys 52, 5 (2019), 1–34.
[50]
Ana C. Lorena, Pedro Y. A. Paiva, and Ricardo B. C. Prudêncio. 2023. Trusting my predictions: On the value of Instance-Level analysis. ACM Computing Surveys (2023).
[51]
Yang Lu, Yiu-Ming Cheung, and Yuan Yan Tang. 2019. Bayes imbalance impact index: A measure of class imbalanced data set for classification problem. IEEE Transactions on Neural Networks and Learning Systems 31, 9 (2019), 3525–3539.
[52]
Julián Luengo and Francisco Herrera. 2015. An automatic extraction method of the domains of competence for learning classifiers using data complexity measures. Knowledge and Information Systems 42 (2015), 147–180.
[53]
Ying Ma, Yichang Li, Junwen Lu, Peng Sun, Yu Sun, and Xiatian Zhu. 2018. Data complexity analysis for software defect detection. International Journal of Performability Engineering 14, 8 (2018), 1695.
[54]
Witold Malina. 2001. Two-parameter fisher criterion. IEEE Trans. Syst. Man Cybern., Part B (Cybernetics) 31, 4 (2001), 629–636.
[55]
Tim Menzies, Jeremy Greenwald, and Art Frank. 2006. Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33, 1 (2006), 2–13.
[56]
Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayşe Bener. 2010. Defect prediction from static code features: Current results, limitations, new approaches. Automated Software Engineering 17, 4 (2010), 375–407.
[57]
Christoph Molnar. 2020. Interpretable Machine Learning. Lulu. com.
[58]
Rebecca Moussa and Federica Sarro. 2022. On the use of evaluation measures for defect prediction studies. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 101–113.
[59]
Daniel Müllner. 2011. Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378. Retrieved from https://arxiv.org/abs/1109.2378
[60]
Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining metrics to predict component failures. In Proceedings of the 28th International Conference on Software Engineering. 452–461.
[61]
Jaechang Nam. 2014. Survey on Software Defect Prediction. Technical Report. The Hong Kong University of Science and Technology.
[62]
Jaechang Nam, Wei Fu, Sunghun Kim, Tim Menzies, and Lin Tan. 2017. Heterogeneous defect prediction. IEEE Transactions on Software Engineering 44, 9 (2017), 874–896.
[63]
Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer defect learning. In Proceedings of the 35th International Conference on Software Engineering. 382–391.
[64]
Lucas Chesini Okimoto, Ricardo Manhães Savii, and Ana Carolina Lorena. 2017. Complexity measures effectiveness in feature selection. In Proceedings of the Brazilian Conference on Intelligent Systems. 91–96.
[65]
Albert Orriols-Puig, Núria Macia, and Tin Kam Ho. 2010. Documentation for the data complexity library in c++. Universitat Ramon Llull, La Salle 196, 1-40 (2010), 12.
[66]
Pedro Yuri Arbs Paiva, Camila Castro Moreno, Kate Smith-Miles, Maria Gabriela Valeriano, and Ana Carolina Lorena. 2022. Relating instance hardness to classification performance in a dataset: A visual approach. Machine Learning 111, 8 (2022), 3085–3123.
[67]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[68]
Adam H. Peterson and Tony R. Martinez. 2005. Estimating the potential for combining learning models. In Proceedings of the ICML Workshop on Meta-Learning. 68–75.
[69]
Erinija Pranckeviciene, TinKam Ho, and Ray Somorjai. 2006. Class separability in spaces reduced by feature selection. In Proceedings of the 18th International Conference on Pattern Recognition. 254–257.
[70]
Yubin Qu, Xiang Chen, Yingquan Zhao, and Xiaolin Ju. 2018. Impact of hyper parameter optimization for cross-project software defect prediction. International Journal of Performability Engineering 14, 6 (2018), 1291.
[71]
Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In Proceedings of the 35th International Conference on Software Engineering. 432–441.
[72]
Miriam Seoane Santos, Pedro Henriques Abreu, Nathalie Japkowicz, Alberto Fernández, Carlos Soares, Szymon Wilk, and Joao Santos. 2022. On the joint-effect of class imbalance and overlap: A critical review. Artificial Intelligence Review 55, 8 (2022), 6207–6275.
[73]
Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. Data quality: Some comments on the nasa software defect datasets. IEEE Transactions on Software Engineering 39, 9 (2013), 1208–1215.
[74]
Shivkumar Shivaji, E James Whitehead, Ram Akella, and Sunghun Kim. 2012. Reducing features to improve code change-based bug prediction. IEEE Transactions on Software Engineering 39, 4 (2012), 552–569.
[75]
NC Shrikanth, Suvodeep Majumder, and Tim Menzies. 2021. Early life cycle software defect prediction. why? how?. In Proceedings of the 43rd International Conference on Software Engineering. 448–459.
[76]
Janet Siegmund, Norbert Siegmund, and Sven Apel. 2015. Views on internal and external validity in empirical software engineering. In Proceedings of the 37th International Conference on Software Engineering. 9–19.
[77]
Sameer Singh. 2003. Multiresolution estimates of classification complexity. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 12 (2003), 1534–1539.
[78]
Yashpal Singh and Alok Singh Chauhan. 2009. Neural networks in data mining. Journal of Theoretical and Applied Information Technology 5, 6 (2009), 36–42.
[79]
Michael R. Smith. 2009. An empirical Study of Instance Hardness. Master’s thesis. Brigham Young University-Provo.
[80]
Michael R. Smith and Tony Martinez. 2016. A comparative evaluation of curriculum learning with filtering and boosting in supervised classification problems. Computational Intelligence 32, 2 (2016), 167–195.
[81]
Michael R. Smith, Tony Martinez, and Christophe Giraud-Carrier. 2014. An instance level analysis of data complexity. Machine Learning 95, 2 (2014), 225–256.
[82]
Mariana A. Souza, George D. C. Cavalcanti, Rafael M. O. Cruz, and Robert Sabourin. 2019. Online local pool generation for dynamic classifier selection. Pattern Recognition 85 (2019), 132–148.
[83]
Charles Spearman. 1904. The proof and measurement of association between two things. The American Journal of Psychology 15 (1904), 72–101.
[84]
MengXin Sun, KunHong Liu, QingQiang Wu, QingQi Hong, BeiZhan Wang, and Haiying Zhang. 2019. A novel ECOC algorithm for multiclass microarray data classification based on data complexity analysis. Pattern Recognition 90 (2019), 346–362.
[85]
Chakkrit Tantithamthavorn, Ahmed E. Hassan, and Kenichi Matsumoto. 2018. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Transactions on Software Engineering 46, 11 (2018), 1200–1219.
[86]
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2016. Automated parameter optimization of classification techniques for defect prediction models. In Proceedings of the 38th International Conference on Software Engineering. 321–332.
[87]
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2018. The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering 45, 7 (2018), 683–711.
[88]
Burak Turhan, Tim Menzies, Ayşe B. Bener, and Justin Di Stefano. 2009. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering 14, 5 (2009), 540–578.
[89]
Felipe N. Walmsley, George D. C. Cavalcanti, Dayvid V. R. Oliveira, Rafael M. O. Cruz, and Robert Sabourin. 2018. An ensemble generation method based on instance hardness. In Proceedings of the International Joint Conference on Neural Networks. 1–8.
[90]
Xiaohui Wan, Zheng Zheng, and Yang Liu. 2022. SPE\(^{2}\): Self-paced ensemble of ensembles for software defect prediction. IEEE Transactions on Reliability 71, 2 (2022), 865–879.
[91]
Song Wang, Taiyue Liu, Jaechang Nam, and Lin Tan. 2018. Deep semantic feature learning for software defect prediction. IEEE Transactions on Software Engineering 46, 12 (2018), 1267–1293.
[92]
Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering. 297–308.
[93]
Shuo Wang and Xin Yao. 2009. Diversity analysis on imbalanced data sets by using ensemble models. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining. 324–331.
[94]
Frank Wilcoxon. 1946. Individual comparisons of grouped data by ranking methods. Journal of Economic Entomology 39, 2 (1946), 269–270.
[95]
David H Wolpert. 1996. The lack of a priori distinctions between learning algorithms. Neural Computation 8, 7 (1996), 1341–1390.
[96]
Rongxin Wu, Hongyu Zhang, Sunghun Kim, and Shing-Chi Cheung. 2011. Relink: Recovering links between bugs and changes. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 15–25.
[97]
Zhou Xu, Jin Liu, Zijiang Yang, Gege An, and Xiangyang Jia. 2016. The impact of feature selection on defect prediction performance: An empirical comparison. In Proceedings of the 27th IEEE International Symposium on Software Reliability Engineering. 309–320.
[98]
Meng Yan, Yicheng Fang, David Lo, Xin Xia, and Xiaohong Zhang. 2017. File-level defect prediction: Unsupervised vs. supervised models. In Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 344–353.
[99]
Jingxiu Yao and Martin Shepperd. 2020. Assessing software defection prediction performance: Why using the matthews correlation coefficient matters. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering. 120–129.
[100]
Fang Zhou, Suting Gao, Lyu Ni, Martin Pavlovski, Qiwen Dong, Zoran Obradovic, and Weining Qian. 2022. Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification. Data Mining and Knowledge Discovery 36, 5 (2022), 1601–1622.
[101]
Tianyi Zhou, Shengjie Wang, and Jeffrey Bilmes. 2020. Curriculum Learning by Dynamic Instance Hardness. In Proceedings of the 34th Conference on Neural Information Processing Systems. 8602–8613.
[102]
Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC press.
[103]
Thomas Zimmermann and Nachiappan Nagappan. 2008. Predicting defects using network analysis on dependency graphs. In Proceedings of the 30th International Conference on Software Engineering. 531–540.

Cited By

View all
  • (2024)Adjusted Trust Score: A Novel Approach for Estimating the Trustworthiness of Software Defect Prediction ModelsIEEE Transactions on Reliability10.1109/TR.2024.339373473:4(1877-1891)Online publication date: Dec-2024
  • (2024)The effect of data complexity on classifier performanceEmpirical Software Engineering10.1007/s10664-024-10554-530:1Online publication date: 31-Oct-2024

Index Terms

  1. Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Software Engineering and Methodology
      ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 6
      July 2024
      951 pages
      EISSN:1557-7392
      DOI:10.1145/3613693
      • Editor:
      • Mauro Pezzé
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 June 2024
      Online AM: 26 February 2024
      Accepted: 15 February 2024
      Revised: 01 January 2024
      Received: 05 May 2023
      Published in TOSEM Volume 33, Issue 6

      Check for updates

      Author Tags

      1. Defect prediction
      2. machine learning
      3. data complexity
      4. instance hardness

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)400
      • Downloads (Last 6 weeks)49
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Adjusted Trust Score: A Novel Approach for Estimating the Trustworthiness of Software Defect Prediction ModelsIEEE Transactions on Reliability10.1109/TR.2024.339373473:4(1877-1891)Online publication date: Dec-2024
      • (2024)The effect of data complexity on classifier performanceEmpirical Software Engineering10.1007/s10664-024-10554-530:1Online publication date: 31-Oct-2024

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media