[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3658644.3690283acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Free access

Using AI Assistants in Software Development: A Qualitative Study on Security Practices and Concerns

Published: 09 December 2024 Publication History

Abstract

Following the recent release of AI assistants, such as OpenAI's ChatGPT and GitHub Copilot, the software industry quickly utilized these tools for software development tasks, e.g., generating code or consulting AI for advice. While recent research has demonstrated that AI-generated code can contain security issues, how software professionals balance AI assistant usage and security remains unclear. This paper investigates how software professionals use AI assistants in secure software development, what security implications and considerations arise, and what impact they foresee on security in software development. We conducted 27 semi-structured interviews with software professionals, including software engineers, team leads, and security testers. We also reviewed 190 relevant Reddit posts and comments to gain insights into the current discourse surrounding AI assistants for software development. Our analysis of the interviews and Reddit posts finds that, despite many security and quality concerns, participants widely use AI assistants for security-critical tasks, e.g., code generation, threat modeling, and vulnerability detection. Participants' overall mistrust leads to checking AI suggestions in similar ways to human code. However, they expect improvements and, therefore, a heavier use of AI for security tasks in the future. We conclude with recommendations for software professionals to critically check AI suggestions, for AI creators to improve suggestion security and capabilities for ethical security tasks, and for academic researchers to consider general-purpose AI in software development.

References

[1]
Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L Mazurek, and Christian Stransky. 2016. You Get Where You're Looking For: The Impact of Information Sources on Code Security. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 289--305.
[2]
Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L. Mazurek, and Christian Stransky. 2017. How Internet Resources Might Be Helping You Develop Faster but Less Securely. IEEE Security & Privacy, Vol. 15, 2 (2017), 50--60.
[3]
Yasemin Acar, Christian Stransky, Dominik Wermke, Charles Weir, Michelle L Mazurek, and Sascha Fahl. 2017. Developers Need Support Too: A Survey of Security Advice for Software Developers. In 2017 IEEE Cybersecurity Development (SecDev). IEEE, 22--26.
[4]
ACM Publications Board. 2021. ACM Publications Policy on Research Involving Human Participants and Subjects. https://www.acm.org/publications/policies/research-involving-human-participants-and-subjects
[5]
Amberscript Global B.V. 2024. Amberscript. https://www.amberscript.com
[6]
Sabrina Amft, Sandra Höltervennhoff, Rebecca Panskus, Karola Marky, and Sascha Fahl. 2024. Everyone for Themselves? A Qualitative Study about Individual Security Setups of Open Source Software Contributors. In 2024 IEEE Symposium on Security and Privacy (SP). IEEE, 249--249.
[7]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology, Vol. 3, 2 (2006), 77--101.
[8]
Virginia Braun and Victoria Clarke. 2024. Got questions about TA? We have prepared some answers to some of the common ones we receive. https://www.thematicanalysis.net/faqs/
[9]
Virginia Braun, Victoria Clarke, and Nikki Hayfield. 2022. ?A starting point for your journey, not a map?: Nikki Hayfield in conversation with Virginia Braun and Victoria Clarke about thematic analysis. Qualitative Research in Psychology, Vol. 19, 2 (2022), 424--445.
[10]
Gordon Burtch, Dokyun Lee, and Zhichen Chen. 2024. Generative AI Degrades Online Communities. Commun. ACM, Vol. 67, 3 (Feb. 2024), 40--42.
[11]
Karoline Busse, Julia Schäfer, and Matthew Smith. 2019. Replication: No One Can Hack My Mind Revisiting a Study on Expert and Non-Expert Security Practices and Advice. In Proc. 15th Symposium on Usable Privacy and Security (SOUPS'19). USENIX, 117--136.
[12]
David Byrne. 2021. A worked example of Braun and Clarke's approach to reflexive thematic analysis. Quality & Quantity, Vol. 56, 3 (June 2021), 1391--1412.
[13]
Mengsu Chen, Felix Fischer, Na Meng, Xiaoyin Wang, and Jens Grossklags. 2019. How Reliable is the Crowdsourced Knowledge of Security Implementation?. In Proc. IEEE/ACM 41st International Conference on Software Engineering (ICSE'19). IEEE, 536--547.
[14]
Thomas Claburn. 2024. AI hallucinates software packages and devs download them -- even if potentially poisoned with malware. The Register (2024). https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages/ Accessed: 2024-04-04.
[15]
Victoria Clarke, Virginia Braun, and Nikki Hayfield. 2015. Thematic analysis. Qualitative psychology: A practical guide to research methods, Vol. 3 (2015), 222--248.
[16]
Copilot. 2024. About GitHub Copilot Enterprise. https://docs.github.com/en/copilot/github-copilot-enterprise/overview/about-github-copilot-enterprise
[17]
Juliet Corbin and Anselm Strauss. 2014. Basics of qualitative research: Techniques and procedures for developing grounded theory. Sage Publications.
[18]
Anastasia Danilova, Stefan Horstmann, Matthew Smith, and Alena Naiakshina. 2022. Testing time limits in screener questions for online surveys with programmers. In Proc. 44th International Conference on Software Engineering (ICSE '22). ACM, 2080--2090.
[19]
Anastasia Danilova, Alena Naiakshina, Stefan Horstmann, and Matthew Smith. 2021. Do you really code? Designing and Evaluating Screening Questions for Online Surveys with Programmers. In Proc. 43rd International Conference on Software Engineering (ICSE '21). IEEE, 537--548.
[20]
Jeffrey Dastin and Anna Tong. 2023. Focus: Google, one of AI?s biggest backers, warns own staff about chatbots. Reuters (2023). https://www.reuters.com/technology/google-one-ais-biggest-backers-warns-own-staff-about-chatbots-2023-06--15/ Accessed: 2024-03-01.
[21]
Randall Degges. 2024. Copilot amplifies insecure codebases by replicating vulnerabilities in your projects. Technical Report. snyk. https://snyk.io/blog/copilot-amplifies-insecure-codebases-by-replicating-vulnerabilities/
[22]
Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, and Yizheng Chen. 2024. Vulnerability Detection with Code Language Models: How Far Are We?arxiv: 2403.18624 [cs.SE]
[23]
Thomas Dohmke. 2023. The economic impact of the AI-powered developer lifecycle and lessons from GitHub Copilot. https://github.blog/2023-06--27-the-economic-impact-of-the-ai-powered-developer-lifecycle-and-lessons-from-github-copilot/ Accessed: 2024-03-01.
[24]
Thomas Dohmke. 2024. GitHub Copilot Enterprise is now generally available. https://github.blog/2024-02--27-github-copilot-enterprise-is-now-generally-available/ Accessed: 2024-04--17.
[25]
Thomas Dohmke, Marco Iansiti, and Greg Richards. 2023. Sea Change in Software Development: Economic and Productivity Analysis of the AI-Powered Developer Lifecycle. arxiv: 2306.15033 [econ.GN]
[26]
Oluwole Fagbohun, Rachel M. Harrison, and Anton Dereventsov. 2024. An Empirical Categorization of Prompting Techniques for Large Language Models: A Practitioner's Guide. arxiv: 2402.14837 [cs.CL]
[27]
Felix Fischer, Konstantin Böttinger, Huang Xiao, Christian Stransky, Yasemin Acar, Michael Backes, and Sascha Fahl. 2017. Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 121--136.
[28]
Felix Fischer, Yannick Stachelscheid, and Jens Grossklags. 2021. The Effect of Google Search on Software Security: Unobtrusive Security Interventions via Content Re-Ranking. In Proc. 28th ACM Conference on Computer and Communication Security (CCS'21). ACM, 3070--3084.
[29]
Felix Fischer, Huang Xiao, Ching-Yu Kao, Yannick Stachelscheid, Benjamin Johnson, Danial Razar, Paul Fawkesley, Nat Buckley, Konstantin Böttinger, Paul Muntean, and Jens Grossklags. 2019. Stack Overflow Considered Helpful! Deep Learning Security Nudges Towards Stronger Cryptography. In Proc. 28th Usenix Security Symposium (SEC'19). USENIX, 339--356.
[30]
Foundation Inc. 2024. Reddit Statistics for 2024: Eye-Opening Usage & Traffic Data. https://foundationinc.co/lab/reddit-statistics/ Accessed: 2024-07--24.
[31]
Nat Friedman. 2021. Introducing GitHub Copilot: your AI pair programmer. https://github.blog/2021-06--29-introducing-github-copilot-ai-pair-programmer/ Accessed: 2024-03-01.
[32]
GitHub. 2024. GitHub Copilot · Your AI pair programmer. https://github.com/features/copilot Accessed: 2024-03-01.
[33]
Sergei Glazunov and Mark Brand. 2024. Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models. https://googleprojectzero.blogspot.com/2024/06/project-naptime.html
[34]
Mark Gurman. 2023. Samsung Bans Staff's AI Use After Spotting ChatGPT Data Leak. Bloomberg (2023). https://www.bloomberg.com/news/articles/2023-05-02/samsung-bans-chatgpt-and-other-generative-ai-use-by-staff-after-leak Accessed: 2024-03-01.
[35]
HackerOne. 2024. Hai: The AI Assistant for Vulnerability Intelligence. https://www.hackerone.com/ai/hai-ai-assistant-vulnerability-intelligence
[36]
H. Hajipour, K. Hassler, T. Holz, L. Schonherr, and M. Fritz. 2024. CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models. In 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 684--709.
[37]
Andrew F Hayes and Klaus Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication methods and measures, Vol. 1, 1 (2007), 77--89.
[38]
Jingxuan He and Martin Vechev. 2023. Large Language Models for Code: Security Hardening and Adversarial Testing. In Proc. 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS '23). ACM, 1865--1879.
[39]
Sameera Horawalavithana, Abhishek Bhattacharjee, Renhao Liu, Nazim Choudhury, Lawrence O. Hall, and Adriana Iamnitchi. 2019. Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub. In IEEE/WIC/ACM International Conference on Web Intelligence (WI '19). ACM, 200--207.
[40]
Iulia Ion, Rob Reeder, and Sunny Consolvo. 2015. textquotedblleft. ..No one Can Hack My Mindtextquotedblright: Comparing Expert and Non-Expert Security Practices. In Proc. 11th Symposium On Usable Privacy and Security (SOUPS'15). USENIX, 327--346.
[41]
Tahira Iqbal, Moniba Khan, Kuldar Taveter, and Norbert Seyff. 2021. Mining Reddit as a New Source for Software Requirements. In 2021 IEEE 29th International Requirements Engineering Conference (RE). IEEE, 128--138.
[42]
Samia Kabir, David N. Udo-Imeh, Bonan Kou, and Tianyi Zhang. 2024. Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions. In Proc. CHI Conference on Human Factors in Computing Systems (CHI '24). ACM, Article 935, 17 pages.
[43]
Harjot Kaur, Sabrina Amft, Daniel Votipka, Yasemin Acar, and Sascha Fahl. 2022. Where to Recruit for Security Development Studies: Comparing Six Software Developer Samples. In 31st USENIX Security Symposium (USENIX Security 22). USENIX, 4041--4058.
[44]
Erin Kenneally and David Dittrich. 2012. The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research. Technical Report. U.S. Department of Homeland Security. https://www.dhs.gov/sites/default/files/publications/CSD-MenloPrinciplesCORE-20120803_1.pdf
[45]
Jan H. Klemmer, Marco Gutfleisch, Christian Stransky, Yasemin Acar, M. Angela Sasse, and Sascha Fahl. 2023. "Make Them Change it Every Week!": A Qualitative Exploration of Online Developer Advice on Usable and Secure Authentication. In Proc. 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS '23). ACM, 2740--2754.
[46]
Bar Lanyado. 2024. Diving Deeper into AI Package Hallucinations. https://www.lasso.security/blog/ai-package-hallucinations#heads-up-hallucinated-packages-in-the-wild Accessed: 2024-04-04.
[47]
Tianshi Li, Elizabeth Louie, Laura Dabbish, and Jason I. Hong. 2021. How Developers Talk About Personal Data and What It Means for User Privacy: A Case Study of a Developer Forum on Reddit. Proc. ACM Hum.-Comput. Interact., Vol. 4, CSCW3, Article 220 (Jan. 2021), 28 pages.
[48]
Jenny T. Liang, Chenyang Yang, and Brad A. Myers. 2024. A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges. In Proc. 46th IEEE/ACM International Conference on Software Engineering (ICSE '24). ACM, Article 52, 13 pages.
[49]
Vahid Majdinasab, Michael Joshua Bishop, Shawn Rasheed, Arghavan Moradidakhel, Amjed Tahir, and Foutse Khomh. 2023. Assessing the Security of GitHub Copilot Generated Code -- A Targeted Replication Study. arxiv: 2311.11177 [cs.SE]
[50]
Michelle Mazurek. 2022. We Are the Experts, and We Are the Problem: The Security Advice Fiasco. In Proc. 29th ACM SIGSAC Conference on Computer and Communications Security (CCS'22). ACM, 7. Keynote.
[51]
Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and Inter-Rater Reliability in Qualitative Research: Norms and Guidelines for CSCW and HCI Practice. ACM on Human-Computer Interaction, Vol. 3, CSCW, Article 72 (2019), 23 pages.
[52]
Meta. 2024. Meta Llama. https://llama.meta.com/
[53]
Stephen Moskal, Sam Laney, Erik Hemberg, and Una-May O'Reilly. 2023. LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing. arxiv: 2310.06936 [cs.CR]
[54]
Alena Naiakshina, Anastasia Danilova, Christian Tiefenau, Marco Herzog, Sergej Dechand, and Matthew Smith. 2017. Why Do Developers Get Password Storage Wrong?: A Qualitative Usability Study. In Proc. 24th ACM Conference on Computer and Communication Security (CCS'17). ACM, 311--328.
[55]
Liang Niu, Shujaat Mirza, Zayd Maradni, and Christina Pöpper. 2023. CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX, 2133--2150.
[56]
OpenAI. 2021. OpenAI Codex. https://openai.com/index/openai-codex/
[57]
OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt Accessed: 2024-03-01.
[58]
OpenAI. 2024. Prompt engineering. https://platform.openai.com/docs/guides/prompt-engineering Accessed: 2024-04--17.
[59]
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the keyboard? assessing the security of github copilot?s code contributions. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 754--768.
[60]
Hammond Pearce, Benjamin Tan, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, and Brendan Dolan-Gavitt. 2022. Pop Quiz! Can a Large Language Model Help With Reverse Engineering?arxiv: 2202.01142 [cs.SE]
[61]
Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants?. In Proc. 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS '23). ACM, 2785--2799.
[62]
Elissa M. Redmiles, Noel Warford, Amritha Jayanti, Aravind Koneru, Sean Kross, Miraida Morales, Rock Stevens, and Michelle L. Mazurek. 2020. A Comprehensive Quality Evaluation of Security and Privacy Advice on the Web. In Proc. 29th USENIX Security Symposium (SEC'20). USENIX, 89--108.
[63]
Robert W. Reeder, Iulia Ion, and Sunny Consolvo. 2017. 152 Simple Steps to Stay Safe Online: Security Advice for Non-Tech-Savvy Users. IEEE Security and Privacy, Vol. 15, 5 (2017), 55--64.
[64]
Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX, 2205--2222.
[65]
Shubhra Kanti Karmaker Santu and Dongji Feng. 2023. TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks. arxiv: 2305.11430 [cs.AI]
[66]
Minghao Shao, Boyuan Chen, Sofija Jancheska, Brendan Dolan-Gavitt, Siddharth Garg, Ramesh Karri, and Muhammad Shafique. 2024. An Empirical Evaluation of LLMs for Solving Offensive Security Challenges. (2024). arxiv: 2402.11814 [cs.CR]
[67]
Minghao Shao, Sofija Jancheska, Meet Udeshi, Brendan Dolan-Gavitt, Haoran Xi, Kimberly Milner, Boyuan Chen, Max Yin, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, and Muhammad Shafique. 2024. NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security. (2024). arxiv: 2406.05590 [cs.CR]
[68]
snyk. 2023. AI Code, Security, and Trust: Organizations Must Change Their Approach. Technical Report. snyk. https://snyk.io/reports/ai-code-security/
[69]
Stack Overflow. 2023. Stack Overflow Developer Survey 2023. https://survey.stackoverflow.co/2023/ Accessed: 2024-03-01.
[70]
Wesley Tann, Yuancheng Liu, Jun Heng Sim, Choon Meng Seah, and Ee-Chien Chang. 2023. Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions. arxiv: 2308.10443 [cs.AI]
[71]
Brandon Vigliarolo. 2023. Apple becomes the latest company to ban ChatGPT for internal use. The Register (2023). https://www.theregister.com/2023/05/19/apple_chatgpt/ Accessed: 2024-03-01.
[72]
Daniel Votipka, Desiree Abrokwa, and Michelle L. Mazurek. 2020. Building and Validating a Scale for Secure Software Development Self-Efficacy. In Proc. 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). ACM, 1--20.
[73]
Fangzhou Wu, Qingzhao Zhang, Ati Priya Bajaj, Tiffany Bao, Ning Zhang, Ruoyu "Fish" Wang, and Chaowei Xiao. 2023. Exploring the Limits of ChatGPT in Software Security Applications. arxiv: 2312.05275 [cs.CR]
[74]
Xin-Li Yang, David Lo, Xin Xia, Zhi-Yuan Wan, and Jian-Ling Sun. 2016. What security questions do developers ask? a large-scale study of stack overflow posts. Journal of Computer Science and Technology, Vol. 31, 5 (2016), 910--924.
[75]
Zhiping Zhang, Michelle Jia, Hao-Ping Lee, Bingsheng Yao, Sauvik Das, Ada Lerner, Dakuo Wang, and Tianshi Li. 2024. "It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents. In Proc. 2024 CHI Conference on Human Factors in Computing Systems (CHI '24). ACM, 1--26.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security
December 2024
5188 pages
ISBN:9798400706363
DOI:10.1145/3658644
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 December 2024

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. ai assistants
  2. generative ai
  3. interviews
  4. large language models
  5. llm
  6. software development
  7. software security

Qualifiers

  • Research-article

Funding Sources

  • NWO, Dutch Research Organization - Kennis- en Innovatieconvenant (KIC)
  • Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)
  • EPSRC
  • NSF (National Science Foundation)
  • European Union Horizon Europe program
  • VolkswagenStiftung

Conference

CCS '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 141
    Total Downloads
  • Downloads (Last 12 months)141
  • Downloads (Last 6 weeks)141
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media