[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3460319.3464797acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article
Open access

Fixing dependency errors for Python build reproducibility

Published: 11 July 2021 Publication History

Abstract

Software reproducibility is important for re-usability and the cumulative progress of research. An important manifestation of unreproducible software is the changed outcome of software builds over time. While enhancing code reuse, the use of open-source dependency packages hosted on centralized repositories such as PyPI can have adverse effects on build reproducibility. Frequent updates to these packages often cause their latest versions to have breaking changes for applications using them. Large Python applications risk their historical builds becoming unreproducible due to the widespread usage of Python dependencies, and the lack of uniform practices for dependency version specification. Manually fixing dependency errors requires expensive developer time and effort, while automated approaches face challenges of parsing unstructured build logs, finding transitive dependencies, and exploring an exponential search space of dependency versions. In this paper, we investigate how open-source Python projects specify dependency versions, and how their reproducibility is impacted by dependency packages. We propose a tool PyDFix to detect and fix unreproducibility in Python builds caused by dependency errors. PyDFix is evaluated on two bug datasets BugSwarm and BugsInPy, both of which are built from real-world open-source projects. PyDFix analyzes a total of 2,702 builds, identifying 1,921 (71.1%) of them to be unreproducible due to dependency errors. From these, PyDFix provides a complete fix for 859 (44.7%) builds, and partial fixes for an additional 632 (32.9%) builds.

References

[1]
Accessed 2021. cloudify-system-tests triggering commit. https://github.com/cloudify-cosmo/cloudify-system-tests/tree/bf27ad94b2fb11183beb2f374f5eb06b7af31bdf
[2]
Accessed 2021. Conda. https://docs.conda.io/en/latest/
[3]
Accessed 2021. configparser. https://pypi.org/project/configparser/
[4]
Accessed 2021. Docker. https://www.docker.com/
[5]
Accessed 2021. flake8. https://pypi.org/project/flake8/
[6]
Accessed 2021. Kubernetes. https://kubernetes.io/
[7]
Accessed 2021. Maven Central Repository. https://repo1.maven.org/maven2/
[8]
Accessed 2021. pip. https://pypi.org/project/pip/
[9]
Accessed 2021. pyatom versions. https://libraries.io/pypi/pyatom/versions
[10]
Accessed 2021. pyenv. https://pypi.org/project/pyenv/
[11]
Accessed 2021. pytest-capturelog. https://libraries.io/pypi/pytest-capturelog
[12]
Accessed 2021. Python Package Index. https://pypi.org/
[13]
Accessed 2021. stevedore. https://pypi.org/project/stevedore/
[14]
Accessed 2021. tox. https://pypi.org/project/tox/
[15]
Accessed 2021. travis-build. https://github.com/travis-ci/travis-build
[16]
Accessed 2021. Travis CI. https://travis-ci.org/
[17]
Accessed 2021. What Is Pip? A Guide for New Pythonistas. https://realpython.com/what-is-pip/
[18]
Accessed 2021. What’s New In Python 3.0. https://docs.python.org/3/whatsnew/3.0.html
[19]
Pietro Abate and Roberto Di Cosmo. 2011. Predicting upgrade failures using dependency analysis. In Workshops Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, Serge Abiteboul, Klemens Böhm, Christoph Koch, and Kian-Lee Tan (Eds.). IEEE Computer Society, 145–150. https://doi.org/10.1109/ICDEW.2011.5767626
[20]
Bente Anda, Dag I. K. Sjøberg, and Audris Mockus. 2009. Variability and Reproducibility in Software Engineering: A Study of Four Companies that Developed the Same System. IEEE Trans. Software Eng., 35, 3 (2009), 407–429. https://doi.org/10.1109/TSE.2008.89
[21]
Fernando Chirigati, Rémi Rampin, Dennis E. Shasha, and Juliana Freire. 2016. ReproZip: Computational Reproducibility With Ease. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 2085–2088. https://doi.org/10.1145/2882903.2899401
[22]
Robert Collins. 2015. PEP 508 – Dependency specification for Python Software Packages. https://www.python.org/dev/peps/pep-0508/
[23]
Alexandre Decan, Tom Mens, and Maëlick Claes. 2016. On the topology of package dependency networks: a comparison of three programming language ecosystems. In Proccedings of the 10th European Conference on Software Architecture Workshops, Copenhagen, Denmark, November 28 - December 2, 2016, Rami Bahsoon and Rainer Weinreich (Eds.). ACM, 21. http://dl.acm.org/citation.cfm?id=3003382
[24]
Alexandre Decan, Tom Mens, and Philippe Grosjean. 2017. An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems. CoRR, abs/1710.04936 (2017), arxiv:1710.04936. arxiv:1710.04936
[25]
Gang Fan, Chengpeng Wang, Rongxin Wu, Xiao Xiao, Qingkai Shi, and Charles Zhang. 2020. Escaping dependency hell: finding build dependency errors with the unified dependency graph. In ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020, Sarfraz Khurshid and Corina S. Pasareanu (Eds.). ACM, 463–474. https://doi.org/10.1145/3395363.3397388
[26]
Foyzul Hassan and Xiaoyin Wang. 2018. HireBuild: an automatic approach to history-driven repair of build scripts. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 1078–1089. https://doi.org/10.1145/3180155.3180181
[27]
Joseph Hejderup, Arie van Deursen, and Georgios Gousios. 2018. Software ecosystem call graph for dependency management. In Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results, ICSE (NIER) 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Andrea Zisman and Sven Apel (Eds.). ACM, 101–104. https://doi.org/10.1145/3183399.3183417
[28]
Eric Horton and Chris Parnin. 2018. Gistable: Evaluating the Executability of Python Code Snippets on GitHub. In 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, September 23-29, 2018. IEEE Computer Society, 217–227. https://doi.org/10.1109/ICSME.2018.00031
[29]
Eric Horton and Chris Parnin. 2019. DockerizeMe: automatic inference of environment dependencies for python code snippets. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 328–338. https://doi.org/10.1109/ICSE.2019.00047
[30]
Eric Horton and Chris Parnin. 2019. V2: Fast Detection of Configuration Drift in Python. In 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11-15, 2019. IEEE, 477–488. https://doi.org/10.1109/ASE.2019.00052
[31]
Riivo Kikas, Georgios Gousios, Marlon Dumas, and Dietmar Pfahl. 2017. Structure and evolution of package dependency networks. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017, Buenos Aires, Argentina, May 20-28, 2017, Jesús M. González-Barahona, Abram Hindle, and Lin Tan (Eds.). IEEE Computer Society, 102–112. https://doi.org/10.1109/MSR.2017.55
[32]
Yiling Lou, Junjie Chen, Lingming Zhang, Dan Hao, and Lu Zhang. 2019. History-driven build failure fixing: how far are we? In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019, Dongmei Zhang and Anders Møller (Eds.). ACM, 43–54. https://doi.org/10.1145/3293882.3330578
[33]
Christian Macho, Shane McIntosh, and Martin Pinzger. 2018. Automatically repairing dependency-related build breakage. In 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018, Rocco Oliveto, Massimiliano Di Penta, and David C. Shepherd (Eds.). IEEE Computer Society, 106–117. https://doi.org/10.1109/SANER.2018.8330201
[34]
Donald Stufft Nick Coghlan. 2013. PEP 440 – Version Identification and Dependency Specification. https://www.python.org/dev/peps/pep-0440/
[35]
Hyunmin Seo, Caitlin Sadowski, Sebastian G. Elbaum, Edward Aftandilian, and Robert W. Bowdidge. 2014. Programmers’ build errors: a case study (at google). In 36th International Conference on Software Engineering, ICSE ’14, Hyderabad, India - May 31 - June 07, 2014, Pankaj Jalote, Lionel C. Briand, and André van der Hoek (Eds.). ACM, 724–734. https://doi.org/10.1145/2568225.2568255
[36]
Dai Hai Ton That, Gabriel Fils, Zhihao Yuan, and Tanu Malik. 2017. Sciunits: Reusable Research Objects. In 13th IEEE International Conference on e-Science, e-Science 2017, Auckland, New Zealand, October 24-27, 2017. IEEE Computer Society, 374–383. https://doi.org/10.1109/eScience.2017.51
[37]
David A. Tomassi, Naji Dmeiri, Yichen Wang, Antara Bhowmick, Yen-Chuan Liu, Premkumar T. Devanbu, Bogdan Vasilescu, and Cindy Rubio-González. 2019. BugSwarm: mining and continuously growing a dataset of reproducible failures and fixes. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 339–349. https://doi.org/10.1109/ICSE.2019.00048
[38]
Marat Valiev, Bogdan Vasilescu, and James D. Herbsleb. 2018. Ecosystem-level determinants of sustained activity in open-source projects: a case study of the PyPI ecosystem. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018, Gary T. Leavens, Alessandro Garcia, and Corina S. Pasareanu (Eds.). ACM, 644–655. https://doi.org/10.1145/3236024.3236062
[39]
Brandon Vigliarolo. 2020. Python overtakes Java to become the second-most popular programming language. https://www.techrepublic.com/article/python-overtakes-java-to-become-the-second-most-popular-programming-language/
[40]
Ying Wang, Ming Wen, Yepang Liu, Yibo Wang, Zhenming Li, Chao Wang, Hai Yu, Shing-Chi Cheung, Chang Xu, and Zhiliang Zhu. 2020. Watchman: monitoring dependency conflicts for Python library ecosystem. In ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 125–135. https://doi.org/10.1145/3377811.3380426
[41]
Ratnadira Widyasari, Sheng Qin Sim, Camellia Lok, Haodi Qi, Jack Phan, Qijin Tay, Constance Tan, Fiona Wee, Jodie Ethelda Tan, Yuheng Yieh, Brian Goh, Ferdian Thung, Hong Jin Kang, Thong Hoang, David Lo, and Eng Lieh Ouh. 2020. BugsInPy: a database of existing bugs in Python programs to enable controlled testing and debugging studies. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 1556–1560. https://doi.org/10.1145/3368089.3417943

Cited By

View all
  • (2025)Detecting and Explaining Python Name ErrorsInformation and Software Technology10.1016/j.infsof.2024.107592178(107592)Online publication date: Feb-2025
  • (2024)SeqMetrics: a unified library for performance metrics calculation in PythonJournal of Open Source Software10.21105/joss.064509:99(6450)Online publication date: Jul-2024
  • (2024)How to Pet a Two-Headed Snake? Solving Cross-Repository Compatibility Issues with HeraProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695064(694-705)Online publication date: 27-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2021
685 pages
ISBN:9781450384599
DOI:10.1145/3460319
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Python
  2. build repair
  3. dependency errors
  4. software reproducibility

Qualifiers

  • Research-article

Funding Sources

Conference

ISSTA '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)372
  • Downloads (Last 6 weeks)56
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Detecting and Explaining Python Name ErrorsInformation and Software Technology10.1016/j.infsof.2024.107592178(107592)Online publication date: Feb-2025
  • (2024)SeqMetrics: a unified library for performance metrics calculation in PythonJournal of Open Source Software10.21105/joss.064509:99(6450)Online publication date: Jul-2024
  • (2024)How to Pet a Two-Headed Snake? Solving Cross-Repository Compatibility Issues with HeraProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695064(694-705)Online publication date: 27-Oct-2024
  • (2024)My Fuzzers Won’t Build: An Empirical Study of Fuzzing Build FailuresACM Transactions on Software Engineering and Methodology10.1145/3688842Online publication date: 21-Aug-2024
  • (2024)Decide: Knowledge-Based Version Incompatibility Detection in Deep Learning StacksCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663796(547-551)Online publication date: 10-Jul-2024
  • (2024)Reproducibility Debt: Challenges and Future PathwaysCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663778(462-466)Online publication date: 10-Jul-2024
  • (2024)Confronting the Reproducibility Crisis: A Case Study of Challenges in Cybersecurity AI2024 Cyber Awareness and Research Symposium (CARS)10.1109/CARS61786.2024.10778911(1-6)Online publication date: 28-Oct-2024
  • (2024)Envyr: Instant Execution with Smart InferenceProcedia Computer Science10.1016/j.procs.2024.06.136238(1068-1073)Online publication date: 2024
  • (2024)An empirical study of fault localization in Python programsEmpirical Software Engineering10.1007/s10664-024-10475-329:4Online publication date: 13-Jun-2024
  • (2023)Revisiting Knowledge-Based Inference of Python Runtime Environments: A Realistic and Adaptive ApproachIEEE Transactions on Software Engineering10.1109/TSE.2023.334647450:2(258-279)Online publication date: 25-Dec-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media