[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3468264.3468584acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

Flaky test detection in Android via event order exploration

Published: 18 August 2021 Publication History

Abstract

Validation of Android apps via testing is difficult owing to the presence of flaky tests. Due to non-deterministic execution environments, a sequence of events (a test) may lead to success or failure in unpredictable ways. In this work, we present an approach and tool FlakeScanner for detecting flaky tests through exploration of event orders. Our key observation is that for a test in a mobile app, there is a testing framework thread which creates the test events, a main User-Interface (UI) thread processing these events, and there may be several other background threads running asynchronously. For any event e whose execution involves potential non-determinism, we localize the earliest (latest) event after (before) which e must happen. We then efficiently explore the schedules between the upper/lower bound events while grouping events within a single statement, to find whether the test outcome is flaky. We also create a suite of subject programs called FlakyAppRepo (containing 33 widely-used Android projects) to study flaky tests in Android apps. Our experiments on the subject-suite FlakyAppRepo show FlakeScanner detected 45 out of 52 known flaky tests as well as 245 previously unknown flaky tests among 1444 tests.

References

[1]
2016. Flaky Tests at Google and How We Mitigate Them. https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html
[2]
2020. Espresso. https://developer.android.com/training/testing/espresso
[3]
2020. Robotium. https://github.com/RobotiumTech/robotium
[4]
Christoffer Quist Adamsen, Gianluca Mezzetti, and Anders Møller. 2015. Systematic Execution of Android Test Suites in Adverse Conditions. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. https://doi.org/10.1145/2771783.2771786
[5]
C. Q. Adamsen, A. Møller, R. Karim, M. Sridharan, F. Tip, and K. Sen. 2017. Repairing Event Race Errors by Controlling Nondeterminism. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE.2017.34
[6]
J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov. 2018. DeFlaker: Automatically Detecting Flaky Tests. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). https://doi.org/10.1145/3180155.3180164
[7]
Tom Bergan, Luis Ceze, and Dan Grossman. 2013. Input-covering schedules for multithreaded programs. https://doi.org/10.1145/2509136.2509508
[8]
Pavol Bielik, Veselin Raychev, and Martin Vechev. 2015. Scalable Race Detection for Android Applications. https://doi.org/10.1145/2814270.2814303
[9]
Ahmed Bouajjani, Michael Emmi, Constantin Enea, Burcu Kulahcioglu Ozkan, and Serdar Tasiran. 2017. Verifying Robustness of Event-Driven Asynchronous Programs Against Concurrency. https://doi.org/10.1007/978-3-662-54434-1_7
[10]
Ankit Choudhary, Shan Lu, and Michael Pradel. 2017. Efficient Detection of Thread Safety Violations via Coverage-Guided Generation of Concurrent Tests. In IEEE/ACM International Conference on Software Engineering. https://doi.org/10.1109/ICSE.2017.32
[11]
Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. 2015. Automated Test Input Generation for Android: Are We There Yet? In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE.2015.89
[12]
James Davis, Arun Thekumparampil, and Dongyoon Lee. 2017. Node.fz: Fuzzing the Server-Side Event-Driven Architecture. In European Conference on Computer Systems. https://doi.org/10.1145/3064176.3064188
[13]
Saikat Dutta, August Shi, Rutvik Choudhary, Zhekun Zhang, Aryaman Jain, and Sasa Misailovic. 2020. Detecting Flaky Tests in Probabilistic and Machine Learning Applications. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3395363.3397366
[14]
Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding flaky tests: the developer’s perspective. In 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). https://doi.org/10.1145/3338906.3338945
[15]
Michael Emmi, Shaz Qadeer, and Zvonimir Rakamaric. 2011. Delay-Bounded Scheduling. In Proceedings of Symposium on Principles of Programming Languages. https://doi.org/10.1145/1926385.1926432
[16]
Lingling Fan, Ting Su, Sen Chen, Guozhu Meng, Yang Liu, Lihua Xu, and Geguang Pu. 2018. Efficiently Manifesting Asynchronous Programming Errors in Android Apps. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. https://doi.org/10.1145/3238147.3238170
[17]
Martin Flower. 2020. Eradicating non-determinism in tests. https://martinfowler.com/articles/nonDeterminism.html
[18]
Xinwei Fu, Dongyoon Lee, and Changhee Jung. 2018. nAdroid: statically detecting ordering violations in Android applications. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. https://doi.org/10.1145/3168829
[19]
Chun-Hung Hsiao, Jie Yu, Satish Narayanasamy, Ziyun Kong, Cristiano Pereira, Gilles Pokam, Peter Chen, and Jason Flinn. 2014. Race Detection for Event-Driven Mobile Applications. https://doi.org/10.1145/2594291.2594330
[20]
Yongjian Hu, Iulian Neamtiu, and Arash Alavi. 2016. Automatically verifying and reproducing event-based races in Android apps. In International Symposium on Software Testing and Analysis. https://doi.org/10.1145/2931037.2931069
[21]
Jeff Huang and Arun K. Rajagopalan. 2016. Precise and Maximal Race Detection from Incomplete Traces. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. https://doi.org/10.1145/2983990.2984024
[22]
Casey Klein, Matthew Flatt, and Robert Findler. 2010. Random Testing for Higher-Order, Stateful Programs. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications. https://doi.org/10.1145/1869459.1869505
[23]
Bohuslav Krena, Zdenek Letko, Tomas Vojnar, and Shmuel Ur. 2010. A platform for search-based testing of concurrent software. In International Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging. https://doi.org/10.1145/1866210.1866215
[24]
Burcu Kulahcioglu Ozkan, Michael Emmi, and Serdar Tasiran. 2015. Systematic Asynchrony Bug Exploration for Android Apps. In International Conference on Computer Aided Verification. https://doi.org/10.1007/978-3-319-21690-4_28
[25]
Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, and Suresh Thummalapenta. 2019. Root causing flaky tests in a large-scale industrial setting. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3293882.3330570
[26]
Wing Lam, Kivanc Muslu, Hitesh Sajnani, and Suresh Thummalapenta. 2020. A Study on the Lifecycle of Flaky Tests. In 42nd International Conference on Software Engineering. https://doi.org/10.1145/3377811.3381749
[27]
W. Lam, R. Oei, A. Shi, D. Marinov, and T. Xie. 2019. iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests. In 12th IEEE Conference on Software Testing, Validation and Verification. https://doi.org/10.1145/3238147.3240465
[28]
Zdeněk Letko. 2013. Analysis and Testing of Concurrent Programs. Information Sciences and Technologies Bulletin of the ACM Slovakia.
[29]
Tongping Liu, Charlie Curtsinger, and Emery Berger. 2011. Dthreads: Efficient deterministic multithreading. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. https://doi.org/10.1145/2043556.2043587
[30]
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In International Symposium on Foundations of Software Engineering (FSE). https://doi.org/10.1145/2635868.2635920
[31]
Pallavi Maiya and Aditya Kanade. 2017. Efficient Computation of Happens-before Relation for Event-Driven Programs. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3092703.3092733
[32]
Pallavi Maiya, Aditya Kanade, and Rupak Majumdar. 2014. Race Detection for Android Applications. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. https://doi.org/10.1145/2594291.2594311
[33]
Arun K. Rajagopalan and Jeff Huang. 2015. RDIT: Race Detection from Incomplete Traces. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. https://doi.org/10.1145/2786805.2803209
[34]
Alan Romano, Zihe Song, Sampath Grandhi, Wei Yang, and Weihang Wang. 2021. An Empirical Analysis of UI-based Flaky Tests. In IEEE/ACM International Conference on Software Engineering. https://doi.org/10.1109/ICSE43902.2021.00141
[35]
Navid Salehnamadi, Abdulaziz Alshayban, Iftekhar Ahmed, and Sam Malek. 2020. ER Catcher: A Static Analysis Framework for Accurate and Scalable Event-Race Detection in Android. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. https://doi.org/10.1145/3324884.3416639
[36]
Anirudh Santhiar, Shalini Kaleeswaran, and Aditya Kanade. 2016. Efficient Race Detection in the Presence of Programmatic Event Loops. In Proceedings of the 25th International Symposium on Software Testing and Analysis. https://doi.org/10.1145/2931037.2931068
[37]
August Shi, Jonathan Bell, and Darko Marinov. 2019. Mitigating the effects of flaky tests on mutation testing. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3293882.3330568
[38]
A. Shi, A. Gyori, O. Legunsen, and D. Marinov. 2016. Detecting Assumptions on Deterministic Implementations of Non-deterministic Specifications. In 2016 IEEE International Conference on Software Testing, Verification and Validation. https://doi.org/10.1109/ICST.2016.40
[39]
August Shi, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. iFixFlakies: a framework for automatically fixing order-dependent flaky tests. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC-FSE). https://doi.org/10.1145/3338906.3338925
[40]
Denini Silva, Leopoldo Teixeira, and Marcelo d’Amorim. 2020. Shake It! Detecting Flaky Tests Caused by Concurrency with Shaker. In IEEE International Conference on Software Maintenance and Evolution. https://doi.org/10.1109/ICSME46990.2020.00037
[41]
Valerio Terragni, Pasquale Salza, and Filomena Ferrucci. 2020. A Container-Based Infrastructure for Fuzzy-Driven Root Causing of Flaky Tests. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results. https://doi.org/10.1145/3377816.3381742
[42]
Swapna Thorve, Chandani Sreshtha, and Na Meng. 2018. An Empirical Study of Flaky Tests in Android Apps. In International Conference on Software Maintenance and Evolution. https://doi.org/10.1109/ICSME.2018.00062
[43]
Ermenegildo Tomasco, Omar Inverso, Bernd Fischer, Salvatore La Torre, and Gennaro Parlato. 2015. Verifying Concurrent Programs by Memory Unwinding. In 21st International Conference on Tools and Algorithms for the Construction and Analysis of Systems. https://doi.org/10.1007/978-3-662-46681-0_52
[44]
Wenyu Wang, Dengfeng Li, Wei Yang, Yurui Cao, Zhenwen Zhang, Yuetang Deng, and Tao Xie. 2018. An Empirical Study of Android Test Generation Tools in Industrial Cases. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.
[45]
D. Wu, J. Liu, Y. Sui, S. Chen, and J. Xue. 2019. Precise Static Happens-Before Analysis for Detecting UAF Order Violations in Android. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). https://doi.org/10.1109/ICST.2019.00035
[46]
Diyu Wu, Jie Liu, Yulei Sui, Shiping Chen, and Jingling Xue. 2019. Precise Static Happens-Before Analysis for Detecting UAF Order Violations in Android. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). https://doi.org/10.1109/ICST.2019.00035
[47]
Jie Yu, Satish Narayanasamy, Cristiano Pereira, and Gilles Pokam. 2012. Maple: A Coverage-Driven Testing Tool for Multithreaded Programs. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications. https://doi.org/10.1145/2384616.2384651
[48]
Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014. Empirically revisiting the test independence assumption. In Proceedings of the 2014 International Symposium on Software Testing and Analysis. https://doi.org/10.1145/2610384.2610404

Cited By

View all
  • (2024)Detecting Concurrent Flaky Test for Android Applications Based on Happens-Before RelationshipsProceedings of the 5th International Conference on Computer Information and Big Data Applications10.1145/3671151.3671358(1187-1191)Online publication date: 26-Apr-2024
  • (2024)Reproducing Timing-Dependent GUI Flaky Tests in Android Apps via a Single Event DelayProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680377(1504-1515)Online publication date: 11-Sep-2024
  • (2024)Flaky Test Detection Based on Adaptive Latest Position Execution for Concurrent Android ApplicationsInternational Journal of Software Engineering and Knowledge Engineering10.1142/S0218194024500232(1-26)Online publication date: 14-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2021
1690 pages
ISBN:9781450385626
DOI:10.1145/3468264
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. concurrency
  2. event order
  3. flaky tests
  4. non-determinism

Qualifiers

  • Research-article

Conference

ESEC/FSE '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)202
  • Downloads (Last 6 weeks)34
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Detecting Concurrent Flaky Test for Android Applications Based on Happens-Before RelationshipsProceedings of the 5th International Conference on Computer Information and Big Data Applications10.1145/3671151.3671358(1187-1191)Online publication date: 26-Apr-2024
  • (2024)Reproducing Timing-Dependent GUI Flaky Tests in Android Apps via a Single Event DelayProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680377(1504-1515)Online publication date: 11-Sep-2024
  • (2024)Flaky Test Detection Based on Adaptive Latest Position Execution for Concurrent Android ApplicationsInternational Journal of Software Engineering and Knowledge Engineering10.1142/S0218194024500232(1-26)Online publication date: 14-Jun-2024
  • (2024)Automatically Reproducing Timing-Dependent Flaky-Test Failures2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00032(269-280)Online publication date: 27-May-2024
  • (2024)Test Code Flakiness in Mobile AppsInformation and Software Technology10.1016/j.infsof.2023.107394168:COnline publication date: 17-Apr-2024
  • (2023)Test Flakiness Across Programming LanguagesIEEE Transactions on Software Engineering10.1109/TSE.2022.320886449:4(2039-2052)Online publication date: 1-Apr-2023
  • (2023)Test flakiness’ causes, detection, impact and responsesJournal of Systems and Software10.1016/j.jss.2023.111837206:COnline publication date: 1-Dec-2023
  • (2022)Flaky Test Sanitisation via On-the-Fly Assumption Inference for Tests with Network Dependencies2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM55253.2022.00037(264-275)Online publication date: Oct-2022
  • (2022)To Seed or Not to Seed? An Empirical Analysis of Usage of Seeds for Testing in Machine Learning Projects2022 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST53961.2022.00026(151-161)Online publication date: Apr-2022
  • (2022)What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME55016.2022.00039(352-363)Online publication date: Oct-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media