[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice

Published: 12 July 2024 Publication History

Abstract

Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently, there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how the (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.

References

[1]
Aakash Ahmad, Muhammad Waseem, Peng Liang, Mahdi Fahmideh, Mst Shamima Aktar, and Tommi Mikkonen. 2023. Towards Human-Bot Collaborative Software Architecting with ChatGPT. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. Association for Computing Machinery, New York, NY, USA. 279–285. isbn:9798400700446 https://doi.org/10.1145/3593434.3593468
[2]
Stavros Antifakos, Nicky Kern, Bernt Schiele, and Adrian Schwaninger. 2005. Towards improving trust in context-aware systems by displaying system confidence. In Proceedings of the 7th international conference on Human computer interaction with mobile devices & services. Association for Computing Machinery, New York, NY, USA. 9–14. isbn:1595930892 https://doi.org/10.1145/1085777.1085780
[3]
Shraddha Barke, Michael B James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models. Proceedings of the ACM on Programming Languages, 7, OOPSLA1 (2023), 85–111. https://doi.org/10.1145/3586030
[4]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ’20). Curran Associates Inc., Red Hook, NY, USA. Article 159, 25 pages. isbn:9781713829546
[5]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating large language models trained on code. arxiv:arXiv:2107.03374.
[6]
European Commision. 2021. Internal Market, Industry, Entrepreneurship and SMEs. https://single-market-economy.ec.europa.eu/smes/sme-definition_en Accessed on May 10, 2024
[7]
Cristiano da Silva Cintra and Roberto Almeida Bittencourt. 2015. Being a PBL teacher in Computer Engineering: An interpretative phenomenological analysis. In 2015 IEEE Frontiers in Education Conference (FIE). 1–8. https://doi.org/10.1109/FIE.2015.7344234
[8]
Peter de Vries, Cees Midden, and Don Bouwhuis. 2003. The Effects of Errors on System Trust, Self-Confidence, and the Allocation of Control in Route Planning. Int. J. Hum.-Comput. Stud., 58, 6 (2003), jun, 719–735. issn:1071-5819 https://doi.org/10.1016/S1071-5819(03)00039-9
[9]
Glaucia Melo dos Santos, Edith Law, Paulo S. C. Alencar, and Don Cowan. 2020. Exploring Context-Aware Conversational Agents in Software Development. CoRR, abs/2006.02370 (2020), arXiv:2006.02370.
[10]
Mary T. Dzindolet, Scott A. Peterson, Regina A. Pomranky, Linda G. Pierce, and Hall P. Beck. 2003. The Role of Trust in Automation Reliance. Int. J. Hum.-Comput. Stud., 58, 6 (2003), jun, 697–718. issn:1071-5819 https://doi.org/10.1016/S1071-5819(03)00038-7
[11]
Virginia Eatough and Jonathan A Smith. 2017. Interpretative phenomenological analysis. The Sage handbook of qualitative research in psychology, 193–209. https://doi.org/10.4135/9781446207536.d10
[12]
Linda Erlenhov, Francisco Gomes de Oliveira Neto, and Philipp Leitner. 2020. An empirical study of bots in software development: characteristics and challenges from a practitioner’s perspective. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA. 445–455. isbn:9781450370431 https://doi.org/10.1145/3368089.3409680
[13]
Linda Erlenhov, Francisco Gomes de Oliveira Neto, Riccardo Scandariato, and Philipp Leitner. 2019. Current and future bots in software development. In 2019 IEEE/ACM 1st International Workshop on Bots in Software Engineering (BotSE). 7–11. https://doi.org/10.1109/BotSE.2019.00009
[14]
Saad Ezzini, Sallam Abualhaija, Chetan Arora, and Mehrdad Sabetzadeh. 2023. AI-Based Question Answering Assistance for Analyzing Natural-Language Requirements. In Proceedings of the 45th International Conference on Software Engineering (ICSE ’23). IEEE Press, Piscataway, NJ. 1277–1289. isbn:9781665457019 https://doi.org/10.1109/ICSE48619.2023.00113
[15]
Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler. 2021. The SPACE of Developer Productivity: There’s more to it than you think. Queue, 19, 1 (2021), 20–48. https://doi.org/10.1145/3454122.3454124
[16]
Mohammad Fraiwan and Natheer Khasawneh. 2023. A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions. arxiv:arXiv:2305.00237.
[17]
Ranim Khojah, Mazen Mohamad, Philipp Leitner, and Francisco Gomes de Oliveira Neto. 2023. Package for An Observational Study of ChatGPT Usage in Software Engineering Practice. https://doi.org/10.5281/zenodo.8383359
[18]
Everlyne Kimani, Kael Rowan, Daniel McDuff, Mary Czerwinski, and Gloria Mark. 2019. A conversational agent in support of productivity and wellbeing at work. In 2019 8th international conference on affective computing and intelligent interaction (ACII). 1–7. https://doi.org/10.1109/ACII.2019.8925488
[19]
Mika Koivisto and Simone Grassini. 2023. Best humans still outperform artificial intelligence in a creative divergent thinking task. Scientific Reports, 13, 1 (2023), Sep, 13601. https://doi.org/10.1038/s41598-023-40858-3
[20]
Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen. 2023. CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-Trained Large Language Models. In Proceedings of the 45th International Conference on Software Engineering (ICSE ’23). IEEE Press, Piscataway, NJ. 919–931. isbn:9781665457019 https://doi.org/10.1109/ICSE48619.2023.00085
[21]
Antonio Mastropaolo, Luca Pascarella, Emanuela Guglielmi, Matteo Ciniselli, Simone Scalabrino, Rocco Oliveto, and Gabriele Bavota. 2023. On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot. In Proceedings of the 45th International Conference on Software Engineering (ICSE ’23). IEEE Press, Piscataway, NJ. 2149–2160. isbn:9781665457019 https://doi.org/10.1109/ICSE48619.2023.00181
[22]
Nhan Nguyen and Sarah Nadi. 2022. An empirical evaluation of GitHub copilot’s code suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories. Association for Computing Machinery, New York, NY, USA. 1–5. https://doi.org/10.1145/3524842.3528470
[23]
Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The impact of ai on developer productivity: Evidence from github copilot. arxiv:arXiv:2302.06590.
[24]
Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu, and Maosong Sun. 2023. Communicative Agents for Software Development. arxiv:arXiv:2307.07924.
[25]
Steven I Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and Justin D Weisz. 2023. The programmer’s assistant: Conversational interaction with a large language model for software development. In Proceedings of the 28th International Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, USA. 491–514. https://doi.org/10.1145/3581641.3584037
[26]
Sivasurya Santhanam, Tobias Hecking, Andreas Schreiber, and Stefan Wagner. 2022. Bots in software engineering: a systematic mapping study. PeerJ Computer Science, 8 (2022), 866. https://doi.org/10.7717/peerj-cs.866
[27]
Margaret-Anne Storey and Alexey Zagalsky. 2016. Disrupting developer productivity one bot at a time. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). Association for Computing Machinery, New York, NY, USA. 928–931. isbn:9781450342186 https://doi.org/10.1145/2950290.2983989
[28]
Nigar M Shafiq Surameery and Mohammed Y Shakor. 2023. Use Chat GPT to Solve Programming Bugs. International Journal of Information Technology & Computer Engineering (IJITC) ISSN: 2455-5290, 3, 01 (2023), 17–22. https://doi.org/10.55529/ijitc.31.17.22
[29]
Rosalia Tufano, Luca Pascarella, and Gabriele Bavota. 2023. Automating Code-Related Tasks Through Transformers: The Impact of Pre-training. arxiv:arXiv:2302.04048.
[30]
Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Association for Computing Machinery, New York, NY, USA. Article 332, 7 pages. isbn:9781450391566 https://doi.org/10.1145/3491101.3519665
[31]
Douglas Walton. 2010. Burden of Proof in Deliberation Dialogs. In Argumentation in Multi-Agent Systems. Springer Berlin Heidelberg, Berlin, Heidelberg. 1–22. https://doi.org/10.1007/978-3-642-12805-9_1
[32]
Muhammad Waseem, Teerath Das, Aakash Ahmad, Mahdi Fehmideh, Peng Liang, and Tommi Mikkonen. 2023. Using ChatGPT throughout the Software Development Life Cycle by Novice Developers. arxiv:arXiv:2310.13648.
[33]
Justin D Weisz, Michael Muller, Steven I Ross, Fernando Martinez, Stephanie Houde, Mayank Agarwal, Kartik Talamadupula, and John T Richards. 2022. Better together? an evaluation of ai-supported code translation. In 27th International Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, USA. 369–391. https://doi.org/10.1145/3490099.3511157
[34]
Andrew Wood, Paige Rodeghero, Ameer Armaly, and Collin McMillan. 2018. Detecting speech act types in developer question/answer conversations during bug repair. In Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. Association for Computing Machinery, New York, NY, USA. 491–502. https://doi.org/10.1145/3236024.3236031
[35]
Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* ’20). Association for Computing Machinery, New York, NY, USA. 295–305. isbn:9781450369367 https://doi.org/10.1145/3351095.3372852

Index Terms

  1. Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Software Engineering
      Proceedings of the ACM on Software Engineering  Volume 1, Issue FSE
      July 2024
      2770 pages
      EISSN:2994-970X
      DOI:10.1145/3554322
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution-NoDerivs International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 July 2024
      Published in PACMSE Volume 1, Issue FSE

      Author Tags

      1. Chatbots
      2. Large Language Models (LLMs)
      3. Software Development Bots

      Qualifiers

      • Research-article

      Funding Sources

      • WASP

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 900
        Total Downloads
      • Downloads (Last 12 months)900
      • Downloads (Last 6 weeks)262
      Reflects downloads up to 13 Dec 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media