[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3524610.3527923acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Towards exploring the code reuse from stack overflow during software development

Published: 20 October 2022 Publication History

Abstract

As one of the most well-known programmer Q&A websites, Stack Overflow (i.e., SO) is serving tens of thousands of developers every day. Previous work has shown that many developers reuse the code snippets on SO when they find an answer (from SO) that functionally matches the programming problem they encounter in their development activities. To study how programmers reuse code on SO during project development, we conduct a comprehensive empirical study. First, to capture the development activities of programmers, we collect 342,148 modified code snippets in commits from 793 open-source Java projects, and these modified code can reflect the programming problems encountered during development. We also collect the code snippets from 1,355,617 posts on SO. Then, we employ CCFinder to detect the code clone between the modified code from commits and the code from SO, and further analyze the code reuse when programmer solves a programming problem during development. We count the code reuse ratios of the modified code snippets in the commits of each project in different years, the results show that the average code reuse ratio is 6.32%, and the maximum is 8.38%. The code reuse ratio in project commits has increased year by year, and the proportion of code reuse in the newly established project is higher than that of old projects. We also find that some projects reuse the code snippets from many years ago. Additionally, we find that experienced developers seem to be more likely to reuse the knowledge on SO. Moreover, we find that the code reuse ratio in bug-related commits (6.67%) is slightly higher than that of in non-bug-related commits (6.59%). Furthermore, we also find that the code reuse ratio (14.44%) in Java class files that have undergone multiple modifications is more than double the overall code reuse ratio (6.32%).

References

[1]
Rabe Abdalkareem, Emad Shihab, and Juergen Rilling. 2017. On code reuse from stackoverflow: An exploratory study on android apps. Information and Software Technology 88 (2017), 148--158.
[2]
Arshad Ahmad, Chong Feng, Kan Li, Syed Mohammad Asim, and Tingting Sun. 2019. Toward empirically investigating non-functional requirements of iOS developers on stack overflow. IEEE Access 7 (2019), 61145--61169.
[3]
Le An, Ons Mlouki, Foutse Khomh, and Giuliano Antoniol. 2017. Stack overflow: a code laundering platform?. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 283--293.
[4]
Sebastian Baltes and Stephan Diehl. 2019. Usage and attribution of Stack Overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 1259--1295.
[5]
Sebastian Baltes, Richard Kiefer, and Stephan Diehl. 2017. Attribution required: Stack overflow code snippets in GitHub projects. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 161--163.
[6]
Anton Barua, Stephen W Thomas, and Ahmed E Hassan. 2014. What are developers talking about? an analysis of topics and trends in stack overflow. Empirical Software Engineering 19, 3 (2014), 619--654.
[7]
Ira D Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant'Anna, and Lorraine Bier. 1998. Clone detection using abstract syntax trees. In Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272). IEEE, 368--377.
[8]
Amiangshu Bosu, Christopher S Corley, Dustin Heaton, Debarshi Chatterji, Jeffrey C Carver, and Nicholas A Kraft. 2013. Building reputation in stackoverflow: an empirical investigation. In 2013 10th working conference on mining software repositories (MSR). IEEE, 89--92.
[9]
Fuxiang Chen and Sunghun Kim. 2015. Crowd debugging. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 320--332.
[10]
Mengsu Chen, Felix Fischer, Na Meng, Xiaoyin Wang, and Jens Grossklags. 2019. How reliable is the crowdsourced knowledge of security implementation?. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 536--547.
[11]
Alex Cummaudo, Rajesh Vasa, Scott Barnett, John Grundy, and Mohamed Abdelrazek. 2020. Interpreting cloud computer vision pain-points: a mining study of Stack Overflow. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 1584--1596.
[12]
Felix Fischer, Konstantin Böttinger, Huang Xiao, Christian Stransky, Yasemin Acar, Michael Backes, and Sascha Fahl. 2017. Stack Overflow Considered Harmful? The Impact of Copy Paste on Android Application Security. In 2017 IEEE Symposium on Security and Privacy (SP). 121--136.
[13]
Beat Fluri, Michael Wursch, Martin PInzger, and Harald Gall. 2007. Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Transactions on software engineering 33, 11 (2007), 725--743.
[14]
Chase Greco, Tyler Haden, and Kostadin Damevski. 2018. StackInTheFlow: behavior-driven recommendation system for stack overflow posts. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 5--8.
[15]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).
[16]
Gary Hsieh, Robert E Kraut, and Scott E Hudson. 2010. Why pay? Exploring how financial incentives are used for question & answer. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 305--314.
[17]
Gang Huang, Chaoran Luo, Kaidong Wu, Yun Ma, Ying Zhang, and Xuanze Liu. 2019. Software-defined infrastructure for decentralized data lifecycle governance: principled design and open challenges. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 1674--1683.
[18]
Steve TK Jan, Chun Wang, Qing Zhang, and Gang Wang. 2017. Analyzing payment based question and answering service. CoRR (2017).
[19]
Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (2002), 654--670.
[20]
Raghavan Komondoor and Susan Horwitz. 2001. Using slicing to identify duplication in source code. In International static analysis symposium. Springer, 40--56.
[21]
Xuanzhe Liu, Gang Huang, Qi Zhao, Hong Mei, and M Brian Blake. 2014. iMashup: a mashup-based framework for service composition. Science China Information Sciences 57, 1 (2014), 1--20.
[22]
Cristina V Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a map of code duplicates on GitHub. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1--28.
[23]
Adriaan Lotter, Sherlock A Licorish, Bastin Tony Roy Savarimuthu, and Sarah Meldrum. 2018. Code reuse in stack overflow and popular open source java projects. In 2018 25th Australasian Software Engineering Conference (ASWEC). IEEE, 141--150.
[24]
Will WK Ma and Albert Chan. 2014. Knowledge sharing and social media: Altruism, perceived online attachment motivation, and perceived online relationship commitment. Computers in human behavior 39 (2014), 51--58.
[25]
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann. 2011. Design lessons from the fastest q&a site in the west. In Proceedings of the SIGCHI conference on Human factors in computing systems. 2857--2866.
[26]
Saraj Singh Manes and Olga Baysal. 2021. Studying the Change Histories of Stack Overflow and GitHub Snippets. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 283--294.
[27]
Na Meng, Stefan Nagy, Danfeng Yao, Wenjie Zhuang, and Gustavo Arango Argoty. 2018. Secure coding practices in java: Challenges and vulnerabilities. In Proceedings of the 40th International Conference on Software Engineering. 372--383.
[28]
Chaiyong Ragkhitwetsagul, Jens Krinke, Matheus Paixao, Giuseppe Bianco, and Rocco Oliveto. 2019. Toxic code snippets on stack overflow. IEEE Transactions on Software Engineering (2019).
[29]
Muhammad Sajidur Rahman. 2016. An empirical case study on Stack Overflow to explore developers' security challenges. Masters Report.
[30]
Matthias Rieger, Stéphane Ducasse, and Michele Lanza. 2004. Insights into system-wide code duplication. In 11th Working Conference on Reverse Engineering. IEEE, 100--109.
[31]
Chanchal K Roy, James R Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of computer programming 74, 7 (2009), 470--495.
[32]
Robert Tairas and Jeff Gray. 2006. Phoenix-based clone detection using suffix trees. In Proceedings of the 44th annual Southeast regional conference. 679--684.
[33]
Henry Tang and Sarah Nadi. 2021. On using Stack Overflow comment-edit pairs to recommend code maintenance changes. Empirical Software Engineering 26, 4 (2021), 1--35.
[34]
László Tóth, Balázs Nagy, Tibor Gyimóthy, and László Vidács. 2020. Why will myquestion be closed? nlp-based pre-submission predictions of question closing reasons on stack overflow. In 2020 IEEE/ACM 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). IEEE, 45--48.
[35]
Bogdan Vasilescu, Vladimir Filkov, and Alexander Serebrenik. 2013. Stackoverflow and github: Associations between software development and crowdsourced knowledge. In 2013 International Conference on Social Computing. IEEE, 188--195.
[36]
Liting Wang, Li Zhang, and Jing Jiang. 2020. Duplicate question detection with deep learning in stack overflow. IEEE Access 8 (2020), 25964--25975.
[37]
Shaowei Wang, Tse-Hsun Chen, and Ahmed E Hassan. 2018. How do users revise answers on technical Q&A websites? A case study on Stack Overflow. IEEE Transactions on Software Engineering 46, 9 (2018), 1024--1038.
[38]
Yuhao Wu, Shaowei Wang, Cor-Paul Bezemer, and Katsuro Inoue. 2019. How do developers utilize source code from stack overflow? Empirical Software Engineering 24, 2 (2019), 637--673.
[39]
Di Yang, Pedro Martins, Vaibhav Saini, and Cristina Lopes. 2017. Stack overflow in github: any snippets there?. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 280--290.
[40]
Xin-Li Yang, David Lo, Xin Xia, Zhi-Yuan Wan, and Jian-Ling Sun. 2016. What security questions do developers ask? a large-scale study of stack overflow posts. Journal of Computer Science and Technology 31, 5 (2016), 910--924.
[41]
Mohamad Yazdaninia, David Lo, and Ashkan Sami. 2021. Characterization and Prediction of Questions without Accepted Answers on Stack Overflow. arXiv preprint arXiv:2103.11386 (2021).
[42]
Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, and Graham Neubig. 2018. Poster: Learning to Mine Parallel Natural Language/Source Code Corpora from Stack Overflow. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion). IEEE, 388--389.
[43]
Hao Yu, Wing Lam, Long Chen, Ge Li, Tao Xie, and Qianxiang Wang. 2019. Neural detection of semantic code clones via tree-based convolution. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).
[44]
Ahmed Zerouali, Camilo Velázquez-Rodríguez, and Coen De Roover. 2021. Identifying Versions of Libraries used in Stack Overflow Code Snippets. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 341--345.
[45]
Haoxiang Zhang, Shaowei Wang, Tse-Hsun Peter Chen, Ying Zou, and Ahmed E Hassan. 2019. An empirical study of obsolete answers on Stack Overflow. IEEE Transactions on Software Engineering (2019).
[46]
Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online Q&A forum reliable?: a study of API misuse on stack overflow. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 886--896.
[47]
Wei Emma Zhang, Quan Z Sheng, Jey Han Lau, and Ermyas Abebe. 2017. Detecting duplicate posts in programming QA communities via latent semantics and association rules. In Proceedings of the 26th International Conference on World Wide Web. 1221--1229.
[48]
Minhaz F Zibran, Ripon K Saha, Muhammad Asaduzzaman, and Chanchal K Roy. 2011. Analyzing and forecasting near-miss clones in evolving software: An empirical study. In 2011 16th IEEE International Conference on Engineering of Complex Computer Systems. IEEE, 295--304.
[49]
Yue Zou, Bihuan Ban, Yinxing Xue, and Yun Xu. 2020. CCGraph: a PDG-based code clone detector with approximate graph matching. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 931--942.

Cited By

View all
  • (2024)How Do Developers Adapt Code Snippets to Their Contexts? An Empirical Study of Context-Based Code Snippet AdaptationsIEEE Transactions on Software Engineering10.1109/TSE.2024.339551950:11(2712-2731)Online publication date: Nov-2024
  • (2024)Streamlining Serious Game Development: An Extensible System for Unifying Controls in Cross-Platform Serious Game Development2024 IEEE 18th International Symposium on Applied Computational Intelligence and Informatics (SACI)10.1109/SACI60582.2024.10619869(000249-000254)Online publication date: 23-May-2024
  • (2023)Unveiling the Potential of Large Language Models in Generating Semantic and Cross-Language Clones2023 IEEE 17th International Workshop on Software Clones (IWSC)10.1109/IWSC60764.2023.00011(22-28)Online publication date: 1-Oct-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPC '22: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension
May 2022
698 pages
ISBN:9781450392983
DOI:10.1145/3524610
  • Conference Chairs:
  • Ayushi Rastogi,
  • Rosalia Tufano,
  • General Chair:
  • Gabriele Bavota,
  • Program Chairs:
  • Venera Arnaoudova,
  • Sonia Haiduc
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GitHub
  2. code clone
  3. code commit
  4. code reuse
  5. software development
  6. stack overflow

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China under Grant
  • Key-Area Research and Development Program of Guangdong Province of China
  • Guangdong Basic and Applied Basic Research Foundation

Conference

ICPC '22
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)4
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)How Do Developers Adapt Code Snippets to Their Contexts? An Empirical Study of Context-Based Code Snippet AdaptationsIEEE Transactions on Software Engineering10.1109/TSE.2024.339551950:11(2712-2731)Online publication date: Nov-2024
  • (2024)Streamlining Serious Game Development: An Extensible System for Unifying Controls in Cross-Platform Serious Game Development2024 IEEE 18th International Symposium on Applied Computational Intelligence and Informatics (SACI)10.1109/SACI60582.2024.10619869(000249-000254)Online publication date: 23-May-2024
  • (2023)Unveiling the Potential of Large Language Models in Generating Semantic and Cross-Language Clones2023 IEEE 17th International Workshop on Software Clones (IWSC)10.1109/IWSC60764.2023.00011(22-28)Online publication date: 1-Oct-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media