[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

An Empirical Study of Fault Triggers in Deep Learning Frameworks

Published: 01 July 2023 Publication History

Abstract

Deep learning frameworks play a key rule to bridge the gap between deep learning theory and practice. With the growing of safety- and security-critical applications built upon deep learning frameworks, their reliability is becoming increasingly important. To ensure the reliability of these frameworks, several efforts have been taken to study the causes and symptoms of bugs in deep learning frameworks, however, relatively little progress has been made in investigating the fault triggering conditions of those bugs. This paper presents the first comprehensive empirical study on fault triggering conditions in three widely-used deep learning frameworks (i.e., TensorFlow, MXNET and PaddlePaddle). We have collected 3,555 bug reports from GitHub repositories of these frameworks. A bug classification is performed based on fault triggering conditions, followed by the analysis of frequency distribution of different bug types and the evolution features. The correlations between bug types and fixing time are investigated. Moreover, we have also studied the root causes of Bohrbugs and Mandelbugs and investigated the important consequences of each bug type. Finally, the analysis of regression bugs in deep learning frameworks is conducted. We have revealed 12 important findings based on our empirical results and have provided 10 implications for developers and users.

References

[1]
Y. Sun, X. Wang, and X. Tang, “Deep learning face representation by joint identification-verification,” in Proc. 27th Int. Conf. Adv. Neural Informat. Process. Syst., 2014, pp. 1988–1996.
[2]
E. Menasalvas and C. Gonzalo-Martin, Challenges of Medical Text and Image Processing: Machine Learning Approaches, Berlin, Germany: Springer, 2016.
[3]
Y. Wu et al., “Google's neural machine translation system: Bridging the gap between human and machine translation,” 2016,.
[4]
X. Sun, X. Liu, B. Li, Y. Duan, and J. Hu, “Exploring topic models in software engineering data analysis: A survey,” in Proc. IEEE/ACIS Int. Conf. Softw. Eng., Artif. Intell. Netw. Parallel/Distrib. Comput., 2016, pp. 357–362.
[5]
L. Wang, X. Sun, J. Wang, Y. Duan, and B. Li, “Construct bug knowledge graph for bug resolution,” in Proc. IEEE/ACM 39th Int. Conf. Softw. Eng. Companion, 2017, pp. 189–191.
[6]
B. Sravyapranati, D. Suma, C. Manjulatha, and S. Putheti, “Large-scale video classification with convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 689–695.
[7]
M. Bojarski et al., “End to end learning for self-driving cars,” 2016,.
[8]
None, “Uber self-driving car fatality,” New Scientist, vol. 237, no. 3170, pp. 7–7, 2018.
[9]
C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “DeepDriving: Learning affordance for direct perception in autonomous driving,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 2722–2730.
[10]
S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis and D. Feng, “Early diagnosis of alzheimer's disease with deep learning,” in Proc. IEEE 11th Int. Symp. Biomed. Imag., 2014, pp. 1015–1018.
[11]
Y. Zhang, Y. Chen, S.-C. Cheung, Y. Xiong, and L. Zhang, “An empirical study on tensorflow program bugs,” in Proc. 27th ACM SIGSOFT Int. Symp. Softw. Testing Anal., 2018, pp. 129–140.
[12]
L. Jia, H. Zhong, X. Wang, L. Huang, and X. Lu, “The symptoms, causes, and repairs of bugs inside a deep learning library,” J. Syst. Softw., vol. 177, 2021, Art. no.
[13]
M. J. Islam, G. Nguyen, R. Pan, and H. Rajan, “A comprehensive study on deep learning bug characteristics,” in Proc. 27th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., 2019, pp. 510–520.
[14]
Y. Jia, “Caffe: Convolutional architecture for fast feature embedding,” in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 675–678.
[15]
M. Lux and M. Bertini, “Open source column: Deep learning with Keras,” ACM SIGMultimedia Rec., vol. 10, no. 4, pp. 7–7, 2019.
[16]
M. Abadi et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” 2016,.
[17]
T. T. D. Team et al., “Theano: A python framework for fast computation of mathematical expressions,” 2016,.
[18]
R. Collobert, S. Bengio, and J. Mariéthoz, “Torch: A modular machine learning software library,” Idiap, Martigny, Switzerland, Tech. Rep. 02–46, 2002.
[19]
M. Grottke and K. S. Trivedi, “A classification of software faults,” J. Rel. Eng. Assoc. Japan, vol. 27, pp. 425–438, 2005.
[20]
M. Grottke and K. S. Trivedi, “Software faults, software aging and software rejuvenation (special survey: New development of software reliability engineering),” J. Rel. Eng. Assoc. Japan., vol. 27, no. 7, pp. 425–438, 2005.
[21]
D. Cotroneo, M. Grottke, R. Natella, R. Pietrantuono, and K. S. Trivedi, “Fault triggers in open-source software: An experience report,” in Proc. IEEE 24th Int. Symp. Softw. Rel. Eng., 2013, pp. 178–187.
[22]
B. Pang, E. Nijkamp, and Y. N. Wu, “Deep learning with tensorflow: A review,” J. Educ. Behav. Statist., vol. 45, no. 2, pp. 227–248, 2020.
[23]
T. Chen et al., “MXnet: A flexible and efficient machine learning library for heterogeneous distributed systems,” 2015, arXiv:1512.01274.
[24]
Y. Ma, D. Yu, T. Wu, and H. Wang, “Paddlepaddle: An open-source deep learning platform from industrial practice,” Front. Data Domputing, vol. 1, no. 1, pp. 105–115, 2019.
[25]
X. Du, G. Xiao, and Y. Sui, “Fault triggers in the TensorFlow Framework: An experience report,” in Proc. IEEE 31st Int. Symp. Softw. Rel. Eng., 2020, pp. 1–12.
[26]
X. Sun, T. Zhou, G. Li, J. Hu, H. Yang, and B. Li, “An empirical study on real bugs for machine learning programs,” in Proc. 24th Asia-Pacific Softw. Eng. Conf., 2017, pp. 348–357.
[27]
Z. Wang, M. Yan, J. Chen, S. Liu, and D. Zhang, “Deep learning library testing via effective model generation,” in Proc. 28th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., 2020, pp. 788–799.
[28]
G. Jahangirova, N. Humbatova, G. Bavota, V. Riccio, A. Stocco, and P. Tonella, “Taxonomy of real faults in deep learning systems,” 2019,.
[29]
A. Shatnawi, G. Al-Bdour, R. Al-Qurran, and M. Al-Ayyoub, “A comparative study of open source deep learning frameworks,” in Proc. 9th Int. Conf. Informat. Commun. Syst., 2018, pp. 72–77.
[30]
Z. Wan, D. Lo, X. Xia, and L. Cai, “Bug characteristics in blockchain systems: A large-scale empirical study,” in Proc. IEEE/ACM 14th Int. Conf. Mining Softw. Repositories, 2017, pp. 413–424.
[31]
L. Tan, C. Liu, Z. Li, X. Wang, Y. Zhou, and C. Zhai, “Bug characteristics in open source software,” Empir. Softw. Eng., vol. 19, no. 6, pp. 1665–1705, 2014.
[32]
J. Wang, W. Dou, Y. Gao, C. Gao, F. Qin, K. Yin, and J. Wei, “A comprehensive study on real world concurrency bugs in node.js,” in Proc. 32nd IEEE/ACM Int. Conf. Automated Softw. Eng., 2017, pp. 520–531.
[33]
G. Xiao, Z. Zheng, B. Yin, K. S. Trivedi, X. Du, and K. Cai, “Experience report: Fault triggers in linux operating system: From evolution perspective,” in Proc. IEEE 28th Int. Symp. Softw. Rel. Eng., 2017, pp. 101–111.
[34]
A. Di Sorbo, J. Spillner, G. Canfora, and S. Panichella, “”won't we fix this issue?,” qualitative characterization and automated identification of wontfix issues on github,” 2019,.
[35]
A. Hindle and C. Onuczko, “Preventing duplicate bug reports by continuously querying bug reports,” Empirical Softw. Eng., vol. 24, no. 2, pp. 902–936, 2019.
[36]
G. Xiao, Z. Zheng, B. Yin, K. S. Trivedi, X. Du, and K.-Y. Cai, “An empirical study of fault triggers in the linux operating system: An evolutionary perspective,” IEEE Trans. Rel., vol. 68, no. 4, pp. 1356–1383, 2019.
[37]
F. Qin, Z. Zheng, X. Li, Y. Qiao, and K. S. Trivedi, “An empirical investigation of fault triggers in android operating system,” in Proc. IEEE 22nd Pacific Rim Int. Symp. Dependable Comput., 2017, pp. 135–144.
[38]
M. Grottke, A. P. Nikora, and K. S. Trivedi, “An empirical investigation of fault types in space mission system software,” in Proc, IEEE/IFIP Int. Conf. Dependable Syst. Netw., 2010, pp. 447–456.
[39]
I. Goodfellow and N. Papernot, “The challenge of verification and testing of machine learning,” Cleverhans-Blog, 2017.
[40]
Y. Sui and J. Xue, “SVF: Interprocedural static value-flow analysis in LLVM,” in Proc. 25th Int. Conf. Compiler Construction. ACM, 2016, pp. 265–266.
[41]
Y. Lei and Y. Sui, “Fast and precise handling of positive weight cycles for field-sensitive pointer analysis,” in Proc. Int. Static Anal. Symp., 2019, pp. 27–47.
[42]
K. S. Trivedi and G. E. Andrade, “Software fault mitigation and availability assurance techniques,” Int. J. Syst. Assurance Eng. Manage., vol. 1, pp. 340–350, 2010.
[43]
O. Russakovsky, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[44]
R. Bruno and P. Ferreira, “A study on garbage collection algorithms for big data environments,” ACM Comput. Surv., vol. 51, no. 1, pp. 1–35, 2018.
[45]
S. Rajbhandari, J. Rasley, O. Ruwase, and Y. He, “Zero: Memory optimization towards training a trillion parameter models,” 2019,.
[46]
Y. Sui, D. Ye, and J. Xue, “Static memory leak detection using full-sparse value-flow analysis,” in Proc. Int. Symp. Softw. Testing Anal., 2012, pp. 254–264.
[47]
M. Xie, L. Lei, Y. Hao, C. Wu, and H. Geng, “SysMon: Monitoring memory behaviors via os approach,” in Proc. Int. Workshop Adv. Parallel Process. Technol., 2017, pp. 51–63.
[48]
Y. Sui, D. Ye, and J. Xue, “Detecting memory leaks statically with full-sparse value-flow analysis,” IEEE Trans. Softw. Eng., vol. 40, no. 2, pp. 107–122, Feb. 2014.
[49]
M. Cavage, “There is no getting around it: You are building a distributed system,” Commun. ACM, vol. 56, no. 6, pp. 63–70, 2013.
[50]
S. Lu, S. Park, E. Seo, and Y. Zhou, “Learning from mistakes: A comprehensive study on real world concurrency bug characteristics,” in Proc. 13th Int. Conf. Archit. Support Prog. Lang. Oper. Syst., 2008, pp. 329–339.
[51]
W. U. Zhendong, L. U. Kai, and X. Wang, “Surveying concurrency bug detectors based on types of detected bugs,” Sci. China, vol. 25, no. 03, pp. 5–31, 2017.
[52]
K. S. Trivedi, R. Mansharamani, D. S. Kim, M. Grottke, and M. Nambiar, “Recovery from failures due to mandelbugs in it systems,” in Proc. IEEE 17th Pacific Rim Int. Symp. Dependable Comput., 2011, pp. 224–233.
[53]
H. B. Mann, “Nonparametric tests against trend,” Econometrica: J. Econometric Soc., vol. 13, no. 3, pp. 245–259, 1945.
[54]
H. V. Pham, T. Lutellier, W. Qi, and L. Tan, “Cradle: Cross-backend validation to detect and localize bugs in deep learning libraries,” in Proc. IEEE/ACM 41st Int. Conf. Softw. Eng., 2019, pp. 1027–1038.
[55]
F. Thung, S. Wang, D. Lo, and L. Jiang, “An empirical study of bugs in machine learning systems,” in Proc. IEEE 23rd Int. Symp. Softw. Rel. Eng., 2012, pp. 271–280.
[56]
Y. Qiao, Z. Zheng, Y. Fang, F. Qin, K. S. Trivedi, and K.-Y. Cai, “Two-level rejuvenation for android smartphones and its optimization,” IEEE Trans. Rel., vol. 68, no. 2, pp. 633–652, 2018.
[57]
C. Marinescu, “Should we beware the exceptions? an empirical study on the eclipse project,” in Proc. Int. Symp. Symbolic Numeric Algorithms Sci. Comput., 2013, pp. 250–257.
[58]
M. Kechagia, M. Fragkoulis, P. Louridas, and D. Spinellis, “The exception handling riddle: An empirical study on the android api,” J. Syst. Softw., vol. 142, no. Aug., pp. 248–270, 2018.
[59]
M. Medeiros, U. Kulesza, R. Bonifacio, E. Adachi, and R. Coelho, “Improving bug localization by mining crash reports: An industrial study,” in Proc. IEEE Int. Conf. Softw. Maintenance Evol., 2020, pp. 766–775.
[60]
M. Khattar, Y. Lamba, and A. Sureka, “SARATHI: Characterization study on regression bugs and identification of regression bug inducing changes: A case-study on google chromium project,” in Proc. 8th India Softw. Eng. Conf., 2015, pp. 50–59.
[61]
G. Xiao, Z. Zheng, B. Jiang, and Y. Sui, “An empirical study of regression bug chains in linux,” IEEE Trans. Rel., vol. 69, no. 2, pp. 558–570, 2020.
[62]
A. Bajaj and O. P. Sangwan, “A survey on regression testing using nature-inspired approaches,” in Proc. 4th IEEE Int. Conf. Comput. Commun. Automat., 2019, pp. 1–5.
[63]
S. Nayak, C. Kumar, S. Tripathi, N. Mohanty, and V. Baral, “Regression test optimization and prioritization using honey bee optimization algorithm with fuzzy rule base,” Soft Comput., vol. 25, no. 15, pp. 9925–9942, 2021.
[64]
D. Nir, S. S. Tyszberowicz, and A. Yehudai, “Locating regression bugs,” in Proc. 3rd Int. Haifa 1469 Verification Conf., HardwareSoftw.: Verification Testing, 2007, pp. 218–234.
[65]
A. Tarvo, “Mining software history to improve software maintenance quality: A case study,” IEEE Softw., vol. 26, no. 1, pp. 34–40, 2009.
[66]
M. Grottke and K. S. Trivedi, “Fighting bugs: Remove, retry, replicate, and rejuvenate,” Computer, vol. 40, no. 2, pp. 107–109, 2007.
[67]
S. Russo, D. Cotroneo, R. Pietrantuono, and K. Trivedi, “How do bugs surface? a comprehensive study on the characteristics of software bugs manifestation,” J. Syst. Softw., vol. 113, pp. 27–43, 2016.
[68]
X. Du, Z. Zheng, G. Xiao, Z. Zhou, and K. S. Trivedi, “DeepSim: Deep semantic information-based automatic mandelbug classification,” IEEE Trans. Rel., to be published.
[69]
R. Chillarege, “Understanding bohr-mandel bugs through ODC triggers and a case study with empirical estimations of their field proportion,” in Proc. IEEE 3rd Int. Workshop Softw. Aging Rejuvenation, 2011, pp. 7–13.
[70]
K. S. Trivedi, M. Grottke, and E. Andrade, “Software fault mitigation and availability assurance techniques,” Int. J. Syst. Assurance Eng. Manage., vol. 1, no. 4, pp. 340–350, 2010.
[71]
M. Grottke, D. S. Kim, R. Mansharamani, M. Nambiar, R. Natella, and K. S. Trivedi, “Recovery from software failures caused by mandelbugs,” IEEE Trans. Rel., vol. 65, no. 1, pp. 70–87, Mar. 2016.
[72]
G. Carrozza, D. Cotroneo, R. Natella, R. Pietrantuono, and S. Russo, “Analysis and prediction of mandelbugs in an industrial software system,” in Proc. IEEE 6th Int. Conf. Softw. Testing, Verification Validation, 2013, pp. 262–271.
[73]
A. Bovenzi, D. Cotroneo, R. Pietrantuono, and S. Russo, “Workload characterization for software aging analysis,” in Proc. IEEE 22nd Int. Symp. Softw. Rel. Eng., 2011, pp. 240–249.
[74]
N. A. Valentim, A. Macedo, and R. Matias, “A systematic mapping review of the first 20 years of software aging and rejuvenation research,” in Proc. IEEE Int. Symp. Softw. Rel. Eng. Workshops, 2016, pp. 57–63.
[75]
K. S. Trivedi, K. Vaidyanathan, and K. Goseva-Popstojanova, “Modeling and analysis of software aging and rejuvenation,” in Proc. 33rd Annu. Simul. Symp., 2000, pp. 270–279.
[76]
Y. Huang, C. Kintala, N. Kolettis, and N. D. Fulton, “Software rejuvenation: Analysis, module and applications,” in Proc. 25th Int. Symp. Fault-Tolerant Comput. Dig. Papers, 1995, pp. 381–390.
[77]
D. Cotroneo, R. Natella, and R. Pietrantuono, “Predicting aging-related bugs using software complexity metrics,” Perform. Eval., vol. 70, no. 3, pp. 163–178, 2013.
[78]
J. M. Zhang, M. Harman, L. Ma, and Y. Liu, “Machine learning testing: Survey, landscapes and horizons,” IEEE Trans. Softw. Eng., vol. 48, no. 1, pp. 1–36, Jan. 2022.
[79]
Q. Guo et al., “An orchestrated empirical study on deep learning frameworks and platforms,” 2018,.
[80]
N. Humbatova, G. Jahangirova, G. Bavota, V. Riccio, A. Stocco, and P. Tonella, “Taxonomy of real faults in deep learning systems,” in Proc. 42nd Int. Conf. Softw. Eng., 2020, pp. 1110–1121.
[81]
T. Zhang, C. Gao, L. Ma, M. Lyu, and M. Kim, “An empirical study of common challenges in developing deep learning applications,” in Proc. IEEE 30th Int. Symp. Softw. Rel. Eng., 2019, pp. 104–115.

Cited By

View all
  • (2024)How Does Pre-trained Language Model Perform on Deep Learning Framework Bug Prediction?Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643113(346-347)Online publication date: 14-Apr-2024
  • (2023)Do Pretrained Language Models Indeed Understand Software Engineering Tasks?IEEE Transactions on Software Engineering10.1109/TSE.2023.330895249:10(4639-4655)Online publication date: 1-Oct-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing  Volume 20, Issue 4
July-Aug. 2023
884 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)How Does Pre-trained Language Model Perform on Deep Learning Framework Bug Prediction?Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643113(346-347)Online publication date: 14-Apr-2024
  • (2023)Do Pretrained Language Models Indeed Understand Software Engineering Tasks?IEEE Transactions on Software Engineering10.1109/TSE.2023.330895249:10(4639-4655)Online publication date: 1-Oct-2023

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media