[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3611643.3616314acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Towards Automated Detection of Unethical Behavior in Open-Source Software Projects

Published: 30 November 2023 Publication History

Abstract

Given the rapid growth of Open-Source Software (OSS) projects, ethical considerations are becoming more important. Past studies focused on specific ethical issues (e.g., gender bias and fairness in OSS). There is little to no study on the different types of unethical behavior in OSS projects. We present the first study of unethical behavior in OSS projects from the stakeholders’ perspective. Our study of 316 GitHub issues provides a taxonomy of 15 types of unethical behavior guided by six ethical principles (e.g., autonomy). Examples of new unethical behavior include soft forking (copying a repository without forking) and self-promotion (promoting a repository without self-identifying as contributor to the repository). We also identify 18 types of software artifacts affected by the unethical behavior. The diverse types of unethical behavior identified in our study (1) call for attentions of developers and researchers when making contributions in GitHub, and (2) point to future research on automated detection of unethical behavior in OSS projects. From our study, we propose Etor, an approach that can automatically detect six types of unethical behavior by using ontological engineering and Semantic Web Rule Language (SWRL) rules to model GitHub attributes and software artifacts. Our evaluation on 195,621 GitHub issues (1,765 GitHub repositories) shows that Etor can automatically detect 548 unethical behavior with 74.8% average true positive rate (up to 100% true positive rate). This shows the feasibility of automated detection of unethical behavior in OSS projects.

Supplementary Material

Video (fse23main-p638-p-video.mp4)
"Given the rapid growth of Open-Source Software (OSS) projects, ethical considerations are becoming more important. Past studies focused on specific ethical issues (e.g., gender bias and fairness in OSS). There is little to no study on the different types of unethical behavior in OSS projects. We present the first study of unethical behavior in OSS projects from the stakeholders’ perspective. Our study of 316 GitHub issues provides a taxonomy of 15 types of unethical behavior guided by six ethical principles (e.g., autonomy). Examples of new unethical behavior include soft forking (copying a repository without forking) and self-promotion (promoting a repository without self-identifying as contributor to the repository). We also identify 18 types of software artifacts affected by the unethical behavior. The diverse types of unethical behavior identified in our study (1) call for attentions of developers and researchers when making contributions in GitHub, and (2) point to future research on automated detection of unethical behavior in OSS projects. Based on our study, we propose Etor, an approach that can automatically detect six types of unethical behavior by using ontological engineering and Semantic Web Rule Language (SWRL) rules to model GitHub attributes and software artifacts. Our evaluation on 195,621 GitHub issues (1,765 GitHub repositories) shows that Etor can automatically detect 548 unethical behavior with 74.8% average true positive rate. This shows the feasibility of automated detection of unethical behavior in OSS projects."

References

[1]
[n.d.]. https://www.w3.org/2001/sw/#owl
[2]
[n.d.]. http://www.w3.org/Submission/SWRL/
[3]
[n.d.]. https://github.com/PyGithub/PyGithub
[4]
[n.d.]. https://github-api.kohsuke.org/
[5]
[n.d.]. https://docs.github.com/en/rest/repos
[6]
[n.d.]. https://www.legislation.gov.au/Details/C2017C00180
[7]
[n.d.]. https://github.com/manuel-freire/ac2
[8]
[n.d.]. https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository
[9]
[n.d.]. https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository
[10]
[n.d.]. https://github.com/Anarios/return-youtube-dislike/issues/401
[11]
[n.d.]. https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/disabling-issues
[12]
[n.d.]. https://github.com/rydercalmdown/package_theft_preventor
[13]
[n.d.]. ailab. https://github.com/bilibili/ailab
[14]
[n.d.]. Are we correctly handling console.Console in node objectKeys(console)? https://github.com/sindresorhus/ts-extras/issues/50
[15]
[n.d.]. CUDA vs Naive Speedup? https://github.com/d-li14/involution/issues/1
[16]
[n.d.]. DogeBot2. https://github.com/DGXeon/DogeBot2
[17]
[n.d.]. Squeeze tooltip in the sections panel. https://github.com/livebook-dev/livebook/pull/536
[18]
[n.d.]. VIP. https://github.com/Oreomeow/VIP
[19]
[n.d.]. What is Plagiarism? https://www.plagiarism.org/article/what-is-plagiarism
[20]
2021. Report on University of Minnesota Breach-of-Trust Incident pages. https://lwn.net/ml/linux-kernel/202105051005.49BFABCE@keescook/
[21]
Anneliese Amschler Andrews and Arundeep S %J Empirical Software Engineering Pradhan. 2001. Ethical issues in empirical software engineering: the limits of policy. 6, 2 (2001), 105–110. issn:1573-7616
[22]
Grigoris Antoniou and Frank van Harmelen. 2004. Web ontology language: Owl. In Handbook on ontologies. Springer, 67–92.
[23]
Deepika Badampudi. [n.d.]. Reporting ethics considerations in software engineering publications. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 205–210. isbn:1509040390
[24]
Sebastian Baltes and Stephan Diehl. 2019. Usage and attribution of Stack Overflow code snippets in GitHub projects. Empirical Software Engineering, 24, 3 (2019), 1259–1295.
[25]
Sebastian Baltes, Richard Kiefer, and Stephan Diehl. 2017. Attribution required: Stack overflow code snippets in GitHub projects. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). 161–163.
[26]
Dizza Beimel and Mor Peleg. 2011. Using OWL and SWRL to represent and reason with situation-based access control policies. Data & Knowledge Engineering, 70, 6 (2011), 596–615.
[27]
Stephen R Bergerson. 2000. E-commerce Privacy and the Black Hole of Cyberspace. Wm. Mitchell L. Rev., 27 (2000), 1527.
[28]
Hanene Boussi Rahmouni, Tony Solomonides, Marco Casassa Mont, and Simon Shiu. 2009. Modelling and enforcing privacy for medical data disclosure across Europe. In Medical Informatics in a United and Healthy Europe. IOS Press, 695–699.
[29]
Mark Cenite, Benjamin H Detenber, Andy WK Koh, Alvin LH Lim, Ng Ee %J New Media Soon, and Society. 2009. Doing the right thing online: a survey of bloggers’ ethical beliefs and practices. 11, 4 (2009), 575–597. issn:1461-4448
[30]
Jason A Colquitt. 2001. On the dimensionality of organizational justice: a construct validation of a measure. Journal of applied psychology, 86, 3 (2001), 386.
[31]
Daniela S Cruzes and Tore Dyba. 2011. Recommended steps for thematic synthesis in software engineering. In 2011 international symposium on empirical software engineering and measurement. 275–284.
[32]
Daniela America da Silva, Henrique Duarte Borges Louro, Gildarcio Sousa Goncalves, Johnny Cardoso Marques, Luiz Alberto Vieira Dias, Adilson Marques da Cunha, and Paulo Marcelo Tasinaffo. 2021. Could a Conversational AI Identify Offensive Language? Information, 12, 10 (2021), 418.
[33]
Thomas Eisenbarth, Rainer Koschke, and Daniel Simon. 2003. Locating features in source code. IEEE Transactions on software engineering, 29, 3 (2003), 210–224.
[34]
Batya Friedman, Peter H Kahn, Alan Borning, and Alina Huldtgren. 2013. Value sensitive design and information systems. Springer, 55–95.
[35]
Daniel M German, Yuki Manabe, and Katsuro Inoue. 2010. A sentence-matching method for automatic license identification of source code files. In Proceedings of the IEEE/ACM international conference on Automated software engineering. 437–446.
[36]
Daniel M German, Gregorio Robles, Germán Poo-Caamaño, Xin Yang, Hajimu Iida, and Katsuro Inoue. 2018. "Was My Contribution Fairly Reviewed?" A Framework to Study the Perception of Fairness in Modern Code Reviews. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 523–534.
[37]
Nicolas E Gold and Jens Krinke. [n.d.]. Ethical Mining: A Case Study on MSR Mining Challenges. In Proceedings of the 17th International Conference on Mining Software Repositories. 265–276.
[38]
Yaroslav Golubev, Maria Eliseeva, Nikita Povarov, and Timofey Bryksin. 2020. A study of potential code borrowing and license violations in java projects on github. In Proceedings of the 17th International Conference on Mining Software Repositories. 54–64.
[39]
Frances S Grodzinsky, Keith Miller, and Marty J Wolf. 2003. Ethical issues in open source software. Journal of Information, Communication and Ethics in Society.
[40]
Idris Hsi and Colin Potts. 2000. Studying the Evolution and Enhancement of Software Features. In icsm. 143.
[41]
Syed Fatiul Huq, Ali Zafar Sadiq, and Kazi Sakib. 2019. Understanding the effect of developer sentiment on fix-inducing changes: An exploratory study on github pull requests. In 2019 26th Asia-Pacific Software Engineering Conference (APSEC). 514–521.
[42]
Nasif Imtiaz, Justin Middleton, Joymallya Chakraborty, Neill Robson, Gina Bai, and Emerson Murphy-Hill. [n.d.]. Investigating the effects of gender bias on GitHub. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 700–711. isbn:1728108691
[43]
Georgia M Kapitsaki, Frederik Kramer, and Nikolaos D Tselikas. 2017. Automating the license compatibility process in open source software with SPDX. Journal of systems and software, 131 (2017), 386–401.
[44]
Georgia M Kapitsaki, Nikolaos D Tselikas, and Ioannis E Foukarakis. 2015. An insight into license tools for open source software systems. Journal of Systems and Software, 102 (2015), 72–87.
[45]
ASM Kayes, Wenny Rahayu, Tharam Dillon, and Elizabeth Chang. 2018. Accessing data from multiple sources through context-aware access control. In 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). 551–559.
[46]
David Kocsis and Gert-Jan de Vreede. 2016. Towards a taxonomy of ethical considerations in crowdsourcing.
[47]
Josh Lerner and Jean Tirole. 2005. The scope of open source licensing. Journal of Law, Economics, and Organization, 21, 1 (2005), 20–56.
[48]
Tyler McDonnell, Baishakhi Ray, and Miryung Kim. 2013. An empirical study of api stability and adoption in the android ecosystem. In 2013 IEEE International Conference on Software Maintenance. 70–79.
[49]
Deborah L McGuinness and Frank Van Harmelen. 2004. OWL web ontology language overview. W3C recommendation, 10, 10 (2004), 2004.
[50]
Stuart McIlroy, Nasir Ali, and Ahmed E Hassan. 2016. Fresh apps: an empirical study of frequently-updated mobile apps in the Google play store. Empirical Software Engineering, 21, 3 (2016), 1346–1370.
[51]
Andrew McNamara, Justin Smith, and Emerson Murphy-Hill. [n.d.]. Does ACM’s code of ethics change ethical decision making in software development? In Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. 729–733.
[52]
Teemu Mikkonen, Tere Vadén, and Niklas Vainio. 2007. The Protestant ethic strikes back: Open source developers and the ethic of capitalism. First Monday.
[53]
Courtney Miller, Sophie Cohen, Daniel Klug, Bogdan Vasilescu, and Christian KaUstner. 2022. "Did you miss my comment or what?" understanding toxicity in open source discussions. In Proceedings of the 44th International Conference on Software Engineering. 710–722.
[54]
Brent Mittelstadt. 2019. Principles alone cannot guarantee ethical AI. Nature Machine Intelligence, 1 (2019), 11, https://doi.org/10.1038/s42256-019-0114-4
[55]
Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A measurement study of hate speech in social media. In Proceedings of the 28th ACM conference on hypertext and social media. 85–94.
[56]
Mark A Musen. 2015. The protégé project: a look back and a look forward. AI matters, 1, 4 (2015), 4–12.
[57]
Linus Nyman and Tommi Mikkonen. 2011. To fork or not to fork: Fork motivations in SourceForge projects. International Journal of Open Source Software and Processes (IJOSSP), 3, 3 (2011), 1–9.
[58]
Christopher Oezbek. 2008. Research ethics for studying Open Source projects. 4th Research Room FOSDEM: Libre software communities meet research community.
[59]
Rolf-Helge Pfeiffer. 2020. What constitutes software? An empirical, descriptive study of artifacts. In Proceedings of the 17th International Conference on Mining Software Repositories. 481–491.
[60]
Janice Singer and Norman G. %J IEEE Transactions on Software Engineering Vinson. 2002. Ethical issues in empirical studies of software engineering. 28, 12 (2002), 1171–1180. issn:0098-5589
[61]
Josh Terrell, Andrew Kofink, Justin Middleton, Clarissa Rainear, Emerson R Murphy-Hill, and Chris Parnin. 2016. Gender bias in open source: Pull request acceptance of women versus men. PeerJ Prepr., 4 (2016), e1733.
[62]
Matteo Turilli and Luciano Floridi. 2009. The ethics of information transparency. Ethics and Information Technology, 11, 2 (2009), 105–112.
[63]
Christopher Vendome, Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Daniel German, and Denys Poshyvanyk. 2017. Machine learning-based detection of open source license exceptions. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 118–129.
[64]
Christopher Vendome, Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Daniel German, and Denys Poshyvanyk. [n.d.]. License usage and changes: a large-scale study of java projects on github. In 2015 IEEE 23rd International Conference on Program Comprehension. IEEE, 218–228. isbn:1467381594
[65]
Denny Vrandečić. 2009. Ontology evaluation. In Handbook on ontologies. Springer, 293–313.
[66]
Qiushi Wu and Kangjie Lu. 2021. On the feasibility of stealthily introducing vulnerabilities in open-source software via hypocrite commits. In Proc. Oakland.
[67]
Sihan Xu, Ya Gao, Lingling Fan, Zheli Liu, Yang Liu, and Hua Ji. 2021. LiDetector: License Incompatibility Detection for Open Source Software. ACM Transactions on Software Engineering and Methodology.
[68]
Di Yang, Pedro Martins, Vaibhav Saini, and Cristina Lopes. 2017. Stack overflow in github: any snippets there? In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 280–290.

Cited By

View all
  • (2024)A First Look at Self-Admitted Miscommunications in GitHub IssuesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops10.1145/3691621.3694942(118-127)Online publication date: 27-Oct-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2023
2215 pages
ISBN:9798400703270
DOI:10.1145/3611643
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Ethics in Software Engineering
  2. Open-source software projects

Qualifiers

  • Research-article

Conference

ESEC/FSE '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)145
  • Downloads (Last 6 weeks)13
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A First Look at Self-Admitted Miscommunications in GitHub IssuesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops10.1145/3691621.3694942(118-127)Online publication date: 27-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media