More Web Proxy on the site http://driver.im/

research-article

Open access

Flaky test detection in Android via event order exploration

Authors:

Abhishek Tiwari,

Abhik RoychoudhuryAuthors Info & Claims

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 367 - 378

https://doi.org/10.1145/3468264.3468584

Published: 18 August 2021 Publication History

Abstract

Validation of Android apps via testing is difficult owing to the presence of flaky tests. Due to non-deterministic execution environments, a sequence of events (a test) may lead to success or failure in unpredictable ways. In this work, we present an approach and tool FlakeScanner for detecting flaky tests through exploration of event orders. Our key observation is that for a test in a mobile app, there is a testing framework thread which creates the test events, a main User-Interface (UI) thread processing these events, and there may be several other background threads running asynchronously. For any event e whose execution involves potential non-determinism, we localize the earliest (latest) event after (before) which e must happen. We then efficiently explore the schedules between the upper/lower bound events while grouping events within a single statement, to find whether the test outcome is flaky. We also create a suite of subject programs called FlakyAppRepo (containing 33 widely-used Android projects) to study flaky tests in Android apps. Our experiments on the subject-suite FlakyAppRepo show FlakeScanner detected 45 out of 52 known flaky tests as well as 245 previously unknown flaky tests among 1444 tests.

References

[1]

2016. Flaky Tests at Google and How We Mitigate Them. https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html

[2]

2020. Espresso. https://developer.android.com/training/testing/espresso

[3]

2020. Robotium. https://github.com/RobotiumTech/robotium

[4]

Christoffer Quist Adamsen, Gianluca Mezzetti, and Anders Møller. 2015. Systematic Execution of Android Test Suites in Adverse Conditions. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. https://doi.org/10.1145/2771783.2771786

Digital Library

[5]

C. Q. Adamsen, A. Møller, R. Karim, M. Sridharan, F. Tip, and K. Sen. 2017. Repairing Event Race Errors by Controlling Nondeterminism. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE.2017.34

Digital Library

[6]

J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov. 2018. DeFlaker: Automatically Detecting Flaky Tests. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). https://doi.org/10.1145/3180155.3180164

Digital Library

[7]

Tom Bergan, Luis Ceze, and Dan Grossman. 2013. Input-covering schedules for multithreaded programs. https://doi.org/10.1145/2509136.2509508

Digital Library

[8]

Pavol Bielik, Veselin Raychev, and Martin Vechev. 2015. Scalable Race Detection for Android Applications. https://doi.org/10.1145/2814270.2814303

Digital Library

[9]

Ahmed Bouajjani, Michael Emmi, Constantin Enea, Burcu Kulahcioglu Ozkan, and Serdar Tasiran. 2017. Verifying Robustness of Event-Driven Asynchronous Programs Against Concurrency. https://doi.org/10.1007/978-3-662-54434-1_7

Digital Library

[10]

Ankit Choudhary, Shan Lu, and Michael Pradel. 2017. Efficient Detection of Thread Safety Violations via Coverage-Guided Generation of Concurrent Tests. In IEEE/ACM International Conference on Software Engineering. https://doi.org/10.1109/ICSE.2017.32

Digital Library

[11]

Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. 2015. Automated Test Input Generation for Android: Are We There Yet? In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE.2015.89

Digital Library

[12]

James Davis, Arun Thekumparampil, and Dongyoon Lee. 2017. Node.fz: Fuzzing the Server-Side Event-Driven Architecture. In European Conference on Computer Systems. https://doi.org/10.1145/3064176.3064188

Digital Library

[13]

Saikat Dutta, August Shi, Rutvik Choudhary, Zhekun Zhang, Aryaman Jain, and Sasa Misailovic. 2020. Detecting Flaky Tests in Probabilistic and Machine Learning Applications. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3395363.3397366

Digital Library

[14]

Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding flaky tests: the developer’s perspective. In 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). https://doi.org/10.1145/3338906.3338945

Digital Library

[15]

Michael Emmi, Shaz Qadeer, and Zvonimir Rakamaric. 2011. Delay-Bounded Scheduling. In Proceedings of Symposium on Principles of Programming Languages. https://doi.org/10.1145/1926385.1926432

Digital Library

[16]

Lingling Fan, Ting Su, Sen Chen, Guozhu Meng, Yang Liu, Lihua Xu, and Geguang Pu. 2018. Efficiently Manifesting Asynchronous Programming Errors in Android Apps. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. https://doi.org/10.1145/3238147.3238170

Digital Library

[17]

Martin Flower. 2020. Eradicating non-determinism in tests. https://martinfowler.com/articles/nonDeterminism.html

[18]

Xinwei Fu, Dongyoon Lee, and Changhee Jung. 2018. nAdroid: statically detecting ordering violations in Android applications. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. https://doi.org/10.1145/3168829

Digital Library

[19]

Chun-Hung Hsiao, Jie Yu, Satish Narayanasamy, Ziyun Kong, Cristiano Pereira, Gilles Pokam, Peter Chen, and Jason Flinn. 2014. Race Detection for Event-Driven Mobile Applications. https://doi.org/10.1145/2594291.2594330

Digital Library

[20]

Yongjian Hu, Iulian Neamtiu, and Arash Alavi. 2016. Automatically verifying and reproducing event-based races in Android apps. In International Symposium on Software Testing and Analysis. https://doi.org/10.1145/2931037.2931069

Digital Library

[21]

Jeff Huang and Arun K. Rajagopalan. 2016. Precise and Maximal Race Detection from Incomplete Traces. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. https://doi.org/10.1145/2983990.2984024

Digital Library

[22]

Casey Klein, Matthew Flatt, and Robert Findler. 2010. Random Testing for Higher-Order, Stateful Programs. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications. https://doi.org/10.1145/1869459.1869505

Digital Library

[23]

Bohuslav Krena, Zdenek Letko, Tomas Vojnar, and Shmuel Ur. 2010. A platform for search-based testing of concurrent software. In International Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging. https://doi.org/10.1145/1866210.1866215

Digital Library

[24]

Burcu Kulahcioglu Ozkan, Michael Emmi, and Serdar Tasiran. 2015. Systematic Asynchrony Bug Exploration for Android Apps. In International Conference on Computer Aided Verification. https://doi.org/10.1007/978-3-319-21690-4_28

[25]

Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, and Suresh Thummalapenta. 2019. Root causing flaky tests in a large-scale industrial setting. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3293882.3330570

Digital Library

[26]

Wing Lam, Kivanc Muslu, Hitesh Sajnani, and Suresh Thummalapenta. 2020. A Study on the Lifecycle of Flaky Tests. In 42nd International Conference on Software Engineering. https://doi.org/10.1145/3377811.3381749

Digital Library

[27]

W. Lam, R. Oei, A. Shi, D. Marinov, and T. Xie. 2019. iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests. In 12th IEEE Conference on Software Testing, Validation and Verification. https://doi.org/10.1145/3238147.3240465

Digital Library

[28]

Zdeněk Letko. 2013. Analysis and Testing of Concurrent Programs. Information Sciences and Technologies Bulletin of the ACM Slovakia.

[29]

Tongping Liu, Charlie Curtsinger, and Emery Berger. 2011. Dthreads: Efficient deterministic multithreading. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. https://doi.org/10.1145/2043556.2043587

Digital Library

[30]

Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In International Symposium on Foundations of Software Engineering (FSE). https://doi.org/10.1145/2635868.2635920

Digital Library

[31]

Pallavi Maiya and Aditya Kanade. 2017. Efficient Computation of Happens-before Relation for Event-Driven Programs. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3092703.3092733

Digital Library

[32]

Pallavi Maiya, Aditya Kanade, and Rupak Majumdar. 2014. Race Detection for Android Applications. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. https://doi.org/10.1145/2594291.2594311

Digital Library

[33]

Arun K. Rajagopalan and Jeff Huang. 2015. RDIT: Race Detection from Incomplete Traces. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. https://doi.org/10.1145/2786805.2803209

Digital Library

[34]

Alan Romano, Zihe Song, Sampath Grandhi, Wei Yang, and Weihang Wang. 2021. An Empirical Analysis of UI-based Flaky Tests. In IEEE/ACM International Conference on Software Engineering. https://doi.org/10.1109/ICSE43902.2021.00141

Digital Library

[35]

Navid Salehnamadi, Abdulaziz Alshayban, Iftekhar Ahmed, and Sam Malek. 2020. ER Catcher: A Static Analysis Framework for Accurate and Scalable Event-Race Detection in Android. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. https://doi.org/10.1145/3324884.3416639

Digital Library

[36]

Anirudh Santhiar, Shalini Kaleeswaran, and Aditya Kanade. 2016. Efficient Race Detection in the Presence of Programmatic Event Loops. In Proceedings of the 25th International Symposium on Software Testing and Analysis. https://doi.org/10.1145/2931037.2931068

Digital Library

[37]

August Shi, Jonathan Bell, and Darko Marinov. 2019. Mitigating the effects of flaky tests on mutation testing. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3293882.3330568

Digital Library

[38]

A. Shi, A. Gyori, O. Legunsen, and D. Marinov. 2016. Detecting Assumptions on Deterministic Implementations of Non-deterministic Specifications. In 2016 IEEE International Conference on Software Testing, Verification and Validation. https://doi.org/10.1109/ICST.2016.40

[39]

August Shi, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. iFixFlakies: a framework for automatically fixing order-dependent flaky tests. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC-FSE). https://doi.org/10.1145/3338906.3338925

Digital Library

[40]

Denini Silva, Leopoldo Teixeira, and Marcelo d’Amorim. 2020. Shake It! Detecting Flaky Tests Caused by Concurrency with Shaker. In IEEE International Conference on Software Maintenance and Evolution. https://doi.org/10.1109/ICSME46990.2020.00037

[41]

Valerio Terragni, Pasquale Salza, and Filomena Ferrucci. 2020. A Container-Based Infrastructure for Fuzzy-Driven Root Causing of Flaky Tests. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results. https://doi.org/10.1145/3377816.3381742

Digital Library

[42]

Swapna Thorve, Chandani Sreshtha, and Na Meng. 2018. An Empirical Study of Flaky Tests in Android Apps. In International Conference on Software Maintenance and Evolution. https://doi.org/10.1109/ICSME.2018.00062

[43]

Ermenegildo Tomasco, Omar Inverso, Bernd Fischer, Salvatore La Torre, and Gennaro Parlato. 2015. Verifying Concurrent Programs by Memory Unwinding. In 21st International Conference on Tools and Algorithms for the Construction and Analysis of Systems. https://doi.org/10.1007/978-3-662-46681-0_52

Digital Library

[44]

Wenyu Wang, Dengfeng Li, Wei Yang, Yurui Cao, Zhenwen Zhang, Yuetang Deng, and Tao Xie. 2018. An Empirical Study of Android Test Generation Tools in Industrial Cases. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.

Digital Library

[45]

D. Wu, J. Liu, Y. Sui, S. Chen, and J. Xue. 2019. Precise Static Happens-Before Analysis for Detecting UAF Order Violations in Android. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). https://doi.org/10.1109/ICST.2019.00035

[46]

Diyu Wu, Jie Liu, Yulei Sui, Shiping Chen, and Jingling Xue. 2019. Precise Static Happens-Before Analysis for Detecting UAF Order Violations in Android. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). https://doi.org/10.1109/ICST.2019.00035

[47]

Jie Yu, Satish Narayanasamy, Cristiano Pereira, and Gilles Pokam. 2012. Maple: A Coverage-Driven Testing Tool for Multithreaded Programs. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications. https://doi.org/10.1145/2384616.2384651

Digital Library

[48]

Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014. Empirically revisiting the test independence assumption. In Proceedings of the 2014 International Symposium on Software Testing and Analysis. https://doi.org/10.1145/2610384.2610404

Digital Library

Cited By

Zhang WZhang WZhao R(2024)Detecting Concurrent Flaky Test for Android Applications Based on Happens-Before RelationshipsProceedings of the 5th International Conference on Computer Information and Big Data Applications10.1145/3671151.3671358(1187-1191)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3671151.3671358
Cai XDong ZWang YTiwari APeng XChristakis MPradel M(2024)Reproducing Timing-Dependent GUI Flaky Tests in Android Apps via a Single Event DelayProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680377(1504-1515)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680377
Zhang WWang WZhao R(2024)Flaky Test Detection Based on Adaptive Latest Position Execution for Concurrent Android ApplicationsInternational Journal of Software Engineering and Knowledge Engineering10.1142/S0218194024500232(1-26)Online publication date: 14-Jun-2024
https://doi.org/10.1142/S0218194024500232
Show More Cited By

Index Terms

Flaky test detection in Android via event order exploration
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

A Survey of Flaky Tests
Tests that fail inconsistently, without changes to the code under test, are described as flaky. Flaky tests do not give a clear indication of the presence of software bugs and thus limit the reliability of the test suites that contain them. A recent ...
An empirical analysis of flaky tests
FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

Regression testing is a crucial part of software development. It checks that software changes do not break existing functionality. An important assumption of regression testing is that test outcomes are deterministic: an unmodified test is expected to ...
Characterizing Flaky Tests in Node.js Applications
ASE '23: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering

Regression testing is an important means of assessing the quality of Node.js applications. However, non-deterministic executions inside Node.js framework could make test cases intermittently pass or fail on the same version of code, which are called ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 2021

1690 pages

ISBN:9781450385626

DOI:10.1145/3468264

General Chairs:
Diomidis Spinellis
Athens University of Economics and Business, Greece
,
Georgios Gousios
Facebook, Netherlands / Delft University of Technology, Netherlands
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Massimiliano Di Penta
University of Sannio, Italy

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '21

Sponsor:

SIGSOFT

ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 23 - 28, 2021

Athens, Greece

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
875
Total Downloads

Downloads (Last 12 months)202
Downloads (Last 6 weeks)34

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang WZhang WZhao R(2024)Detecting Concurrent Flaky Test for Android Applications Based on Happens-Before RelationshipsProceedings of the 5th International Conference on Computer Information and Big Data Applications10.1145/3671151.3671358(1187-1191)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3671151.3671358
Cai XDong ZWang YTiwari APeng XChristakis MPradel M(2024)Reproducing Timing-Dependent GUI Flaky Tests in Android Apps via a Single Event DelayProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680377(1504-1515)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680377
Zhang WWang WZhao R(2024)Flaky Test Detection Based on Adaptive Latest Position Execution for Concurrent Android ApplicationsInternational Journal of Software Engineering and Knowledge Engineering10.1142/S0218194024500232(1-26)Online publication date: 14-Jun-2024
https://doi.org/10.1142/S0218194024500232
Rahman SMassey ALam WShi ABell J(2024)Automatically Reproducing Timing-Dependent Flaky-Test Failures2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00032(269-280)Online publication date: 27-May-2024
https://doi.org/10.1109/ICST60714.2024.00032
Pontillo VPalomba FFerrucci F(2024)Test Code Flakiness in Mobile AppsInformation and Software Technology10.1016/j.infsof.2023.107394168:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.infsof.2023.107394
Barbosa KFerreira RPinto Gd'Amorim MMiranda B(2023)Test Flakiness Across Programming LanguagesIEEE Transactions on Software Engineering10.1109/TSE.2022.320886449:4(2039-2052)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TSE.2022.3208864
Tahir ARasheed SDietrich JHashemi NZhang L(2023)Test flakiness’ causes, detection, impact and responsesJournal of Systems and Software10.1016/j.jss.2023.111837206:COnline publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1016/j.jss.2023.111837
Dietrich JRasheed STahir A(2022)Flaky Test Sanitisation via On-the-Fly Assumption Inference for Tests with Network Dependencies2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM55253.2022.00037(264-275)Online publication date: Oct-2022
https://doi.org/10.1109/SCAM55253.2022.00037
Dutta SArunachalam AMisailovic S(2022)To Seed or Not to Seed? An Empirical Analysis of Usage of Seeds for Testing in Machine Learning Projects2022 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST53961.2022.00026(151-161)Online publication date: Apr-2022
https://doi.org/10.1109/ICST53961.2022.00026
Habchi SHaben GSohn JFranci APapadakis MCordy MTraon Y(2022)What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME55016.2022.00039(352-363)Online publication date: Oct-2022
https://doi.org/10.1109/ICSME55016.2022.00039
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents