ModuleGuard: Understanding and Detecting Module Conflicts in Python Ecosystem
Article No.: 211, Pages 1 - 12
Abstract
Python has become one of the most popular programming languages for software development due to its simplicity, readability, and versatility. As the Python ecosystem grows, developers face increasing challenges in avoiding module conflicts, which occur when different packages have the same namespace modules. Unfortunately, existing work has neither investigated the module conflict comprehensively nor provided tools to detect the conflict. Therefore, this paper systematically investigates the module conflict problem and its impact on the Python ecosystem. We propose a novel technique called InstSimulator, which leverages semantics and installation simulation to achieve accurate and efficient module extraction. Based on this, we implement a tool called ModuleGuard to detect module conflicts for the Python ecosystem.
For the study, we first collect 97 MC issues, classify the characteristics and causes of these MC issues, summarize three different conflict patterns, and analyze their potential threats. Then, we conducted a large-scale analysis of the whole PyPI ecosystem (4.2 million packages) and GitHub popular projects (3,711 projects) to detect each MC pattern and analyze their potential impact. We discovered that module conflicts still impact numerous TPLs and GitHub projects. This is primarily due to developers' lack of understanding of the modules within their direct dependencies, not to mention the modules of the transitive dependencies. Our work reveals Python's shortcomings in handling naming conflicts and provides a tool and guidelines for developers to detect conflicts.
References
[1]
Adafruit-Blinka. 2023. Retrieved March 10, 2023 from https://pypi.org/project/Adafruit-Blinka/
[2]
albumentations team. 2022. Retrieved March 10, 2023 from https://github.com/albumentations-team/albumentations/issues/841
[3]
Mahmoud Alfadel, Diego Elias Costa, and Emad Shihab. 2021. Empirical Analysis of Security Vulnerabilities in Python Packages. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 446--457.
[4]
bandersnatch developers. 2022. bandersnatch. Retrieved March 10, 2023 from https://bandersnatch.readthedocs.io/en/latest/
[5]
board. 2023. Retrieved March 10, 2023 from https://pypi.org/project/board/
[6]
Yulu Cao, Lin Chen, Wanwangying Ma, Yanhui Li, Yuming Zhou, and Linzhang Wang. 2022. Towards Better Dependency Management: A First Look At Dependency Smells in Python Projects. IEEE Transactions on Software Engineering (2022), 1--26.
[7]
Wei Cheng, Xiangrong Zhu, and Wei Hu. 2022. Conflict-Aware Inference of Python Compatible Runtime Environments with Domain Knowledge Graph. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE '22). Association for Computing Machinery, New York, NY, USA, 451--461.
[8]
conda. 2023. . Retrieved January 10, 2023 from https://conda.io/
[9]
crates.io. 2022. cargo. Retrieved March 10, 2023 from https://crates.io/
[10]
Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltaformaggio, and Wenke Lee. 2020. Towards measuring supply chain attacks on package managers for interpreted languages. arXiv preprint arXiv:2002.01139 (2020).
[11]
dylanhogg. 2022. Python Awesome Project. Retrieved March 10, 2023 from https://awesomepython.org/
[12]
Python Software Foundation. 2022. api-reference. Retrieved March 10, 2023 from https://warehouse.pypa.io/api-reference/xml-rpc.html
[13]
Python Software Foundation. 2022. opencv-python. Retrieved March 10, 2023 from https://pypi.org/project/opencv-python/
[14]
Python Software Foundation. 2022. Python Dependency Specifiers. Retrieved March 10, 2023 from https://packaging.python.org/en/latest/specifications/dependency-specifiers/
[15]
Python Software Foundation. 2022. Python Documentation. Retrieved March 10, 2023 from https://docs.python.org/3/
[16]
Python Software Foundation. 2022. Python Standard Libraries Documentation. Retrieved March 10, 2023 from https://docs.python.org/3.10/library/index.html
[17]
Google. 2022. deps.dev. Retrieved March 10, 2023 from https://deps.dev/
[18]
Google. 2022. PyPI downloads table. Retrieved March 10, 2023 from https://bigquery.cloud.google.com/table/bigquery-public-data:pypi.downloads
[19]
Wenbo Guo, Zhengzi Xu, Chengwei Liu, Cheng Huang, Yong Fang, and Yang Liu. 2023. An Empirical Study of Malicious Code In PyPI Ecosystem. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 166--177.
[20]
Eric Horton and Chris Parnin. 2019. Dockerizeme: Automatic inference of environment dependencies for python code snippets. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 328--338.
[21]
Piergiorgio Ladisa, Henrik Plate, Matias Martinez, and Olivier Barais. 2022. Taxonomy of attacks on open-source software supply chains. arXiv preprint arXiv:2204.04008 (2022).
[22]
lambdaloop. 2022. aniposelib. Retrieved March 10, 2023 from https://github.com/lambdaloop/anipose/issues/22
[23]
LarsVoelker. 2022. Retrieved March 10, 2023 from https://github.com/LarsVoelker/FibexConverter/issues/7
[24]
Shuo Li. [n. d.]. EasyPip: Detect and Fix Dependency Problems in Python Dependency Declaration Files. ([n. d.]).
[25]
Libraries.io. 2022. Libraries.io Query. Retrieved March 10, 2023 from https://libraries.io/search?order=desc&platforms=PyPI
[26]
Maven. 2021. Maven - Guide to Naming Conventions. https://maven.apache.org/guides/mini/guide-naming-conventions.html Accessed: 2022-01-19.
[27]
Suchita Mukherjee, Abigail Almanza, and Cindy Rubio-González. 2021. Fixing dependency errors for Python build reproducibility. In Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis. 439--451.
[28]
opendatateam. 2022. Cookiecutter-udata-plugin issue. Retrieved March 10, 2023 from https://github.com/opendatateam/cookiecutter-udata-plugin/issues/3
[29]
pip. 2023. pip documentation v22.3.1. Retrieved March 10, 2023 from https://pip.pypa.io/
[30]
poetry. 2023. . Retrieved January 10, 2023 from https://python-poetry.org/docs/repositories/
[31]
pycrypto. 2022. Pycrypto issue. Retrieved March 10, 2023 from https://github.com/pycrypto/pycrypto/issues/156
[32]
pypa. 2022. Retrieved March 10, 2023 from https://github.com/pypa/pip/issues/4625
[33]
pypa. 2022. Retrieved March 10, 2023 from https://github.com/pypa/pip/issues/8509
[34]
pypa. 2023. Package Discovery and Namespace Packages. Retrieved January 10, 2023 from https://setuptools.pypa.io/en/latest/userguide/package_discovery.html
[35]
radekd91. 2022. Emoca issue. Retrieved March 10, 2023 from https://github.com/radekd91/emoca/issues/44
[36]
Jukka Ruohonen, Kalle Hjerppe, and Kalle Rindell. 2021. A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI. In 2021 18th International Conference on Privacy, Security and Trust (PST). IEEE, 1--10.
[37]
sarugaku. 2022. resolvelib. Retrieved March 10, 2023 from https://github.com/sarugaku/resolvelib
[38]
Jessamyn Smith. 2023. pipreq. Retrieved March 10, 2023 from https://github.com/bndr/pipreqs/
[39]
Sonatype. 2021. State of the 2021 Software Supply Chain. Sonatype Blog (2021). https://www.sonatype.com/blog/software-supply-chain-2021
[40]
TIDELIFT. 2022. Libraries.io. Retrieved March 10, 2023 from https://libraries.io/
[41]
virtualenv [n. d.]. Retrieved March 10, 2023 from https://virtualenv.pypa.io/
[42]
Duc-Ly Vu, Ivan Pashchenko, Fabio Massacci, Henrik Plate, and Antonino Sabetta. 2020. Typosquatting and combosquatting attacks on the python ecosystem. In 2020 IEEE European Symposium on Security and Privacy Workshops. IEEE, 509--514.
[43]
Duc-Ly Vu, Ivan Pashchenko, Fabio Massacci, Henrik Plate, and Antonino Sabetta. 2020. Typosquatting and Combosquatting Attacks on the Python Ecosystem. In 2020 IEEE European Symposium on Security and Privacy Workshops. IEEE.
[44]
Chao Wang, Rongxin Wu, Haohao Song, Jiwu Shu, and Guoqing Li. 2022. Smart-Pip: A Smart Approach to Resolving Python Dependency Conflict Issues. IEEE Transactions on Software Engineering (2022).
[45]
Jiawei Wang, Li Li, and Andreas Zeller. 2021. Restoring execution environments of Jupyter notebooks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1622--1633.
[46]
Ying Wang, Ming Wen, Yepang Liu, Yibo Wang, Zhenming Li, Chao Wang, Hai Yu, Shing-Chi Cheung, Chang Xu, and Zhiliang Zhu. 2020. Watchman: Monitoring Dependency Conflicts for Python Library Ecosystem. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 125--135.
[47]
Wikipedia. 2023. Identifier. Retrieved March 10, 2023 from https://en.wikipedia.org/wiki/Identifier#Implicit_context_and_namespace_conflicts
[48]
wtsi hgi. 2022. Python-hgijson issue. Retrieved March 10, 2023 from https://github.com/wtsi-hgi/python-hgijson/issues/14
[49]
Hongjie Ye, Wei Chen, Wensheng Dou, Guoquan Wu, and Jun Wei. 2022. Knowledge-based environment dependency inference for Python programs. In Proceedings of the 44th International Conference on Software Engineering. 1245--1256.
[50]
ysr monitor. 2023. Retrieved March 10, 2023 from https://pypi.org/project/ysr-monitor/
Index Terms
- ModuleGuard: Understanding and Detecting Module Conflicts in Python Ecosystem
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
May 2024
2942 pages
ISBN:9798400702174
DOI:10.1145/3597503
- Co-chairs:
- Ana Paiva,
- Rui Abreu,
- Program Co-chairs:
- Abhik Roychoudhury,
- Margaret Storey
Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
- Faculty of Engineering of University of Porto
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 12 April 2024
Check for updates
Author Tags
Qualifiers
- Research-article
Conference
ICSE '24
Sponsor:
ICSE '24: IEEE/ACM 46th International Conference on Software Engineering
April 14 - 20, 2024
Lisbon, Portugal
Acceptance Rates
Overall Acceptance Rate 276 of 1,856 submissions, 15%
Upcoming Conference
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 92Total Downloads
- Downloads (Last 12 months)92
- Downloads (Last 6 weeks)9
Reflects downloads up to 11 Dec 2024
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in