More Web Proxy on the site http://driver.im/

research-article

An Empirical Study of Fault Triggers in Deep Learning Frameworks

Authors:

Jun AiAuthors Info & Claims

IEEE Transactions on Dependable and Secure Computing, Volume 20, Issue 4

Pages 2696 - 2712

https://doi.org/10.1109/TDSC.2022.3152239

Published: 01 July 2023 Publication History

Abstract

Deep learning frameworks play a key rule to bridge the gap between deep learning theory and practice. With the growing of safety- and security-critical applications built upon deep learning frameworks, their reliability is becoming increasingly important. To ensure the reliability of these frameworks, several efforts have been taken to study the causes and symptoms of bugs in deep learning frameworks, however, relatively little progress has been made in investigating the fault triggering conditions of those bugs. This paper presents the first comprehensive empirical study on fault triggering conditions in three widely-used deep learning frameworks (i.e., TensorFlow, MXNET and PaddlePaddle). We have collected 3,555 bug reports from GitHub repositories of these frameworks. A bug classification is performed based on fault triggering conditions, followed by the analysis of frequency distribution of different bug types and the evolution features. The correlations between bug types and fixing time are investigated. Moreover, we have also studied the root causes of Bohrbugs and Mandelbugs and investigated the important consequences of each bug type. Finally, the analysis of regression bugs in deep learning frameworks is conducted. We have revealed 12 important findings based on our empirical results and have provided 10 implications for developers and users.

References

[1]

Y. Sun, X. Wang, and X. Tang, “Deep learning face representation by joint identification-verification,” in Proc. 27th Int. Conf. Adv. Neural Informat. Process. Syst., 2014, pp. 1988–1996.

[2]

E. Menasalvas and C. Gonzalo-Martin, Challenges of Medical Text and Image Processing: Machine Learning Approaches, Berlin, Germany: Springer, 2016.

[3]

Y. Wu et al., “Google's neural machine translation system: Bridging the gap between human and machine translation,” 2016,.

[4]

X. Sun, X. Liu, B. Li, Y. Duan, and J. Hu, “Exploring topic models in software engineering data analysis: A survey,” in Proc. IEEE/ACIS Int. Conf. Softw. Eng., Artif. Intell. Netw. Parallel/Distrib. Comput., 2016, pp. 357–362.

[5]

L. Wang, X. Sun, J. Wang, Y. Duan, and B. Li, “Construct bug knowledge graph for bug resolution,” in Proc. IEEE/ACM 39th Int. Conf. Softw. Eng. Companion, 2017, pp. 189–191.

[6]

B. Sravyapranati, D. Suma, C. Manjulatha, and S. Putheti, “Large-scale video classification with convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 689–695.

[7]

M. Bojarski et al., “End to end learning for self-driving cars,” 2016,.

[8]

None, “Uber self-driving car fatality,” New Scientist, vol. 237, no. 3170, pp. 7–7, 2018.

[9]

C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “DeepDriving: Learning affordance for direct perception in autonomous driving,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 2722–2730.

[10]

S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis and D. Feng, “Early diagnosis of alzheimer's disease with deep learning,” in Proc. IEEE 11th Int. Symp. Biomed. Imag., 2014, pp. 1015–1018.

[11]

Y. Zhang, Y. Chen, S.-C. Cheung, Y. Xiong, and L. Zhang, “An empirical study on tensorflow program bugs,” in Proc. 27th ACM SIGSOFT Int. Symp. Softw. Testing Anal., 2018, pp. 129–140.

[12]

L. Jia, H. Zhong, X. Wang, L. Huang, and X. Lu, “The symptoms, causes, and repairs of bugs inside a deep learning library,” J. Syst. Softw., vol. 177, 2021, Art. no.

[13]

M. J. Islam, G. Nguyen, R. Pan, and H. Rajan, “A comprehensive study on deep learning bug characteristics,” in Proc. 27th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., 2019, pp. 510–520.

[14]

Y. Jia, “Caffe: Convolutional architecture for fast feature embedding,” in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 675–678.

[15]

M. Lux and M. Bertini, “Open source column: Deep learning with Keras,” ACM SIGMultimedia Rec., vol. 10, no. 4, pp. 7–7, 2019.

Digital Library

[16]

M. Abadi et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” 2016,.

[17]

T. T. D. Team et al., “Theano: A python framework for fast computation of mathematical expressions,” 2016,.

[18]

R. Collobert, S. Bengio, and J. Mariéthoz, “Torch: A modular machine learning software library,” Idiap, Martigny, Switzerland, Tech. Rep. 02–46, 2002.

[19]

M. Grottke and K. S. Trivedi, “A classification of software faults,” J. Rel. Eng. Assoc. Japan, vol. 27, pp. 425–438, 2005.

[20]

M. Grottke and K. S. Trivedi, “Software faults, software aging and software rejuvenation (special survey: New development of software reliability engineering),” J. Rel. Eng. Assoc. Japan., vol. 27, no. 7, pp. 425–438, 2005.

[21]

D. Cotroneo, M. Grottke, R. Natella, R. Pietrantuono, and K. S. Trivedi, “Fault triggers in open-source software: An experience report,” in Proc. IEEE 24th Int. Symp. Softw. Rel. Eng., 2013, pp. 178–187.

[22]

B. Pang, E. Nijkamp, and Y. N. Wu, “Deep learning with tensorflow: A review,” J. Educ. Behav. Statist., vol. 45, no. 2, pp. 227–248, 2020.

[23]

T. Chen et al., “MXnet: A flexible and efficient machine learning library for heterogeneous distributed systems,” 2015, arXiv:1512.01274.

[24]

Y. Ma, D. Yu, T. Wu, and H. Wang, “Paddlepaddle: An open-source deep learning platform from industrial practice,” Front. Data Domputing, vol. 1, no. 1, pp. 105–115, 2019.

[25]

X. Du, G. Xiao, and Y. Sui, “Fault triggers in the TensorFlow Framework: An experience report,” in Proc. IEEE 31st Int. Symp. Softw. Rel. Eng., 2020, pp. 1–12.

[26]

X. Sun, T. Zhou, G. Li, J. Hu, H. Yang, and B. Li, “An empirical study on real bugs for machine learning programs,” in Proc. 24th Asia-Pacific Softw. Eng. Conf., 2017, pp. 348–357.

[27]

Z. Wang, M. Yan, J. Chen, S. Liu, and D. Zhang, “Deep learning library testing via effective model generation,” in Proc. 28th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., 2020, pp. 788–799.

[28]

G. Jahangirova, N. Humbatova, G. Bavota, V. Riccio, A. Stocco, and P. Tonella, “Taxonomy of real faults in deep learning systems,” 2019,.

[29]

A. Shatnawi, G. Al-Bdour, R. Al-Qurran, and M. Al-Ayyoub, “A comparative study of open source deep learning frameworks,” in Proc. 9th Int. Conf. Informat. Commun. Syst., 2018, pp. 72–77.

[30]

Z. Wan, D. Lo, X. Xia, and L. Cai, “Bug characteristics in blockchain systems: A large-scale empirical study,” in Proc. IEEE/ACM 14th Int. Conf. Mining Softw. Repositories, 2017, pp. 413–424.

[31]

L. Tan, C. Liu, Z. Li, X. Wang, Y. Zhou, and C. Zhai, “Bug characteristics in open source software,” Empir. Softw. Eng., vol. 19, no. 6, pp. 1665–1705, 2014.

Digital Library

[32]

J. Wang, W. Dou, Y. Gao, C. Gao, F. Qin, K. Yin, and J. Wei, “A comprehensive study on real world concurrency bugs in node.js,” in Proc. 32nd IEEE/ACM Int. Conf. Automated Softw. Eng., 2017, pp. 520–531.

[33]

G. Xiao, Z. Zheng, B. Yin, K. S. Trivedi, X. Du, and K. Cai, “Experience report: Fault triggers in linux operating system: From evolution perspective,” in Proc. IEEE 28th Int. Symp. Softw. Rel. Eng., 2017, pp. 101–111.

[34]

A. Di Sorbo, J. Spillner, G. Canfora, and S. Panichella, “”won't we fix this issue?,” qualitative characterization and automated identification of wontfix issues on github,” 2019,.

[35]

A. Hindle and C. Onuczko, “Preventing duplicate bug reports by continuously querying bug reports,” Empirical Softw. Eng., vol. 24, no. 2, pp. 902–936, 2019.

Digital Library

[36]

G. Xiao, Z. Zheng, B. Yin, K. S. Trivedi, X. Du, and K.-Y. Cai, “An empirical study of fault triggers in the linux operating system: An evolutionary perspective,” IEEE Trans. Rel., vol. 68, no. 4, pp. 1356–1383, 2019.

[37]

F. Qin, Z. Zheng, X. Li, Y. Qiao, and K. S. Trivedi, “An empirical investigation of fault triggers in android operating system,” in Proc. IEEE 22nd Pacific Rim Int. Symp. Dependable Comput., 2017, pp. 135–144.

[38]

M. Grottke, A. P. Nikora, and K. S. Trivedi, “An empirical investigation of fault types in space mission system software,” in Proc, IEEE/IFIP Int. Conf. Dependable Syst. Netw., 2010, pp. 447–456.

[39]

I. Goodfellow and N. Papernot, “The challenge of verification and testing of machine learning,” Cleverhans-Blog, 2017.

[40]

Y. Sui and J. Xue, “SVF: Interprocedural static value-flow analysis in LLVM,” in Proc. 25th Int. Conf. Compiler Construction. ACM, 2016, pp. 265–266.

Digital Library

[41]

Y. Lei and Y. Sui, “Fast and precise handling of positive weight cycles for field-sensitive pointer analysis,” in Proc. Int. Static Anal. Symp., 2019, pp. 27–47.

[42]

K. S. Trivedi and G. E. Andrade, “Software fault mitigation and availability assurance techniques,” Int. J. Syst. Assurance Eng. Manage., vol. 1, pp. 340–350, 2010.

[43]

O. Russakovsky, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.

Digital Library

[44]

R. Bruno and P. Ferreira, “A study on garbage collection algorithms for big data environments,” ACM Comput. Surv., vol. 51, no. 1, pp. 1–35, 2018.

Digital Library

[45]

S. Rajbhandari, J. Rasley, O. Ruwase, and Y. He, “Zero: Memory optimization towards training a trillion parameter models,” 2019,.

[46]

Y. Sui, D. Ye, and J. Xue, “Static memory leak detection using full-sparse value-flow analysis,” in Proc. Int. Symp. Softw. Testing Anal., 2012, pp. 254–264.

[47]

M. Xie, L. Lei, Y. Hao, C. Wu, and H. Geng, “SysMon: Monitoring memory behaviors via os approach,” in Proc. Int. Workshop Adv. Parallel Process. Technol., 2017, pp. 51–63.

[48]

Y. Sui, D. Ye, and J. Xue, “Detecting memory leaks statically with full-sparse value-flow analysis,” IEEE Trans. Softw. Eng., vol. 40, no. 2, pp. 107–122, Feb. 2014.

Digital Library

[49]

M. Cavage, “There is no getting around it: You are building a distributed system,” Commun. ACM, vol. 56, no. 6, pp. 63–70, 2013.

Digital Library

[50]

S. Lu, S. Park, E. Seo, and Y. Zhou, “Learning from mistakes: A comprehensive study on real world concurrency bug characteristics,” in Proc. 13th Int. Conf. Archit. Support Prog. Lang. Oper. Syst., 2008, pp. 329–339.

[51]

W. U. Zhendong, L. U. Kai, and X. Wang, “Surveying concurrency bug detectors based on types of detected bugs,” Sci. China, vol. 25, no. 03, pp. 5–31, 2017.

[52]

K. S. Trivedi, R. Mansharamani, D. S. Kim, M. Grottke, and M. Nambiar, “Recovery from failures due to mandelbugs in it systems,” in Proc. IEEE 17th Pacific Rim Int. Symp. Dependable Comput., 2011, pp. 224–233.

[53]

H. B. Mann, “Nonparametric tests against trend,” Econometrica: J. Econometric Soc., vol. 13, no. 3, pp. 245–259, 1945.

[54]

H. V. Pham, T. Lutellier, W. Qi, and L. Tan, “Cradle: Cross-backend validation to detect and localize bugs in deep learning libraries,” in Proc. IEEE/ACM 41st Int. Conf. Softw. Eng., 2019, pp. 1027–1038.

[55]

F. Thung, S. Wang, D. Lo, and L. Jiang, “An empirical study of bugs in machine learning systems,” in Proc. IEEE 23rd Int. Symp. Softw. Rel. Eng., 2012, pp. 271–280.

[56]

Y. Qiao, Z. Zheng, Y. Fang, F. Qin, K. S. Trivedi, and K.-Y. Cai, “Two-level rejuvenation for android smartphones and its optimization,” IEEE Trans. Rel., vol. 68, no. 2, pp. 633–652, 2018.

[57]

C. Marinescu, “Should we beware the exceptions? an empirical study on the eclipse project,” in Proc. Int. Symp. Symbolic Numeric Algorithms Sci. Comput., 2013, pp. 250–257.

[58]

M. Kechagia, M. Fragkoulis, P. Louridas, and D. Spinellis, “The exception handling riddle: An empirical study on the android api,” J. Syst. Softw., vol. 142, no. Aug., pp. 248–270, 2018.

[59]

M. Medeiros, U. Kulesza, R. Bonifacio, E. Adachi, and R. Coelho, “Improving bug localization by mining crash reports: An industrial study,” in Proc. IEEE Int. Conf. Softw. Maintenance Evol., 2020, pp. 766–775.

[60]

M. Khattar, Y. Lamba, and A. Sureka, “SARATHI: Characterization study on regression bugs and identification of regression bug inducing changes: A case-study on google chromium project,” in Proc. 8th India Softw. Eng. Conf., 2015, pp. 50–59.

[61]

G. Xiao, Z. Zheng, B. Jiang, and Y. Sui, “An empirical study of regression bug chains in linux,” IEEE Trans. Rel., vol. 69, no. 2, pp. 558–570, 2020.

[62]

A. Bajaj and O. P. Sangwan, “A survey on regression testing using nature-inspired approaches,” in Proc. 4th IEEE Int. Conf. Comput. Commun. Automat., 2019, pp. 1–5.

[63]

S. Nayak, C. Kumar, S. Tripathi, N. Mohanty, and V. Baral, “Regression test optimization and prioritization using honey bee optimization algorithm with fuzzy rule base,” Soft Comput., vol. 25, no. 15, pp. 9925–9942, 2021.

Digital Library

[64]

D. Nir, S. S. Tyszberowicz, and A. Yehudai, “Locating regression bugs,” in Proc. 3rd Int. Haifa 1469 Verification Conf., HardwareSoftw.: Verification Testing, 2007, pp. 218–234.

[65]

A. Tarvo, “Mining software history to improve software maintenance quality: A case study,” IEEE Softw., vol. 26, no. 1, pp. 34–40, 2009.

Digital Library

[66]

M. Grottke and K. S. Trivedi, “Fighting bugs: Remove, retry, replicate, and rejuvenate,” Computer, vol. 40, no. 2, pp. 107–109, 2007.

Digital Library

[67]

S. Russo, D. Cotroneo, R. Pietrantuono, and K. Trivedi, “How do bugs surface? a comprehensive study on the characteristics of software bugs manifestation,” J. Syst. Softw., vol. 113, pp. 27–43, 2016.

[68]

X. Du, Z. Zheng, G. Xiao, Z. Zhou, and K. S. Trivedi, “DeepSim: Deep semantic information-based automatic mandelbug classification,” IEEE Trans. Rel., to be published.

[69]

R. Chillarege, “Understanding bohr-mandel bugs through ODC triggers and a case study with empirical estimations of their field proportion,” in Proc. IEEE 3rd Int. Workshop Softw. Aging Rejuvenation, 2011, pp. 7–13.

[70]

K. S. Trivedi, M. Grottke, and E. Andrade, “Software fault mitigation and availability assurance techniques,” Int. J. Syst. Assurance Eng. Manage., vol. 1, no. 4, pp. 340–350, 2010.

[71]

M. Grottke, D. S. Kim, R. Mansharamani, M. Nambiar, R. Natella, and K. S. Trivedi, “Recovery from software failures caused by mandelbugs,” IEEE Trans. Rel., vol. 65, no. 1, pp. 70–87, Mar. 2016.

[72]

G. Carrozza, D. Cotroneo, R. Natella, R. Pietrantuono, and S. Russo, “Analysis and prediction of mandelbugs in an industrial software system,” in Proc. IEEE 6th Int. Conf. Softw. Testing, Verification Validation, 2013, pp. 262–271.

[73]

A. Bovenzi, D. Cotroneo, R. Pietrantuono, and S. Russo, “Workload characterization for software aging analysis,” in Proc. IEEE 22nd Int. Symp. Softw. Rel. Eng., 2011, pp. 240–249.

[74]

N. A. Valentim, A. Macedo, and R. Matias, “A systematic mapping review of the first 20 years of software aging and rejuvenation research,” in Proc. IEEE Int. Symp. Softw. Rel. Eng. Workshops, 2016, pp. 57–63.

[75]

K. S. Trivedi, K. Vaidyanathan, and K. Goseva-Popstojanova, “Modeling and analysis of software aging and rejuvenation,” in Proc. 33rd Annu. Simul. Symp., 2000, pp. 270–279.

[76]

Y. Huang, C. Kintala, N. Kolettis, and N. D. Fulton, “Software rejuvenation: Analysis, module and applications,” in Proc. 25th Int. Symp. Fault-Tolerant Comput. Dig. Papers, 1995, pp. 381–390.

[77]

D. Cotroneo, R. Natella, and R. Pietrantuono, “Predicting aging-related bugs using software complexity metrics,” Perform. Eval., vol. 70, no. 3, pp. 163–178, 2013.

Digital Library

[78]

J. M. Zhang, M. Harman, L. Ma, and Y. Liu, “Machine learning testing: Survey, landscapes and horizons,” IEEE Trans. Softw. Eng., vol. 48, no. 1, pp. 1–36, Jan. 2022.

Digital Library

[79]

Q. Guo et al., “An orchestrated empirical study on deep learning frameworks and platforms,” 2018,.

[80]

N. Humbatova, G. Jahangirova, G. Bavota, V. Riccio, A. Stocco, and P. Tonella, “Taxonomy of real faults in deep learning systems,” in Proc. 42nd Int. Conf. Softw. Eng., 2020, pp. 1110–1121.

[81]

T. Zhang, C. Gao, L. Ma, M. Lyu, and M. Kim, “An empirical study of common challenges in developing deep learning applications,” in Proc. IEEE 30th Int. Symp. Softw. Rel. Eng., 2019, pp. 104–115.

Cited By

Du XLi CMa XZheng ZRoychoudhury APaiva AAbreu RStorey M(2024)How Does Pre-trained Language Model Perform on Deep Learning Framework Bug Prediction?Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643113(346-347)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639478.3643113
Li YZhang TLuo XCai HFang SYuan D(2023)Do Pretrained Language Models Indeed Understand Software Engineering Tasks?IEEE Transactions on Software Engineering10.1109/TSE.2023.330895249:10(4639-4655)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3308952

Recommendations

A comprehensive empirical study on bug characteristics of deep learning frameworks
Abstract Context:
Deep Learning (DL) frameworks enable developers to build DNN models without learning the underlying algorithms and models. While some of these DL-based software systems have been deployed in safety-critical areas, ...
Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow
Abstract
Deep Learning (DL) frameworks are now widely used, simplifying the creation of complex models as well as their integration into various applications even among non-DL experts. However, like any other programs, they are prone to bugs. This paper ...
A comprehensive study on deep learning bug characteristics
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries and tools to add deep learning capabilities to their software. What kinds of bugs are frequently found in such software? What are the root causes of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Dependable and Secure Computing

IEEE Transactions on Dependable and Secure Computing Volume 20, Issue 4

July-Aug. 2023

884 pages

ISSN:1545-5971

Issue’s Table of Contents

1545-5971 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Du XLi CMa XZheng ZRoychoudhury APaiva AAbreu RStorey M(2024)How Does Pre-trained Language Model Perform on Deep Learning Framework Bug Prediction?Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643113(346-347)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639478.3643113
Li YZhang TLuo XCai HFang SYuan D(2023)Do Pretrained Language Models Indeed Understand Software Engineering Tasks?IEEE Transactions on Software Engineering10.1109/TSE.2023.330895249:10(4639-4655)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3308952

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents