[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3663529.3663849acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Rethinking Software Engineering in the Era of Foundation Models: A Curated Catalogue of Challenges in the Development of Trustworthy FMware

Published: 10 July 2024 Publication History

Abstract

Foundation models (FMs), such as Large Language Models (LLMs), have revolutionized software development by enabling new use cases and business models. We refer to software built using FMs as FMware. The unique properties of FMware (e.g., prompts, agents and the need for orchestration), coupled with the intrinsic limitations of FMs (e.g., hallucination) lead to a completely new set of software engineering challenges. Based on our industrial experience, we identified ten key SE4FMware challenges that have caused enterprise FMware development to be unproductive, costly, and risky. For each of those challenges, we state the path for innovation that we envision. We hope that the disclosure of the challenges will not only raise awareness but also promote deeper and further discussions, knowledge sharing, and innovative solutions.

References

[1]
[n. d.]. AutoGPT Documentation. https://docs.agpt.co/ Accessed 01-17-2024
[2]
[n. d.]. BabyAGI. https://github.com/yoheinakajima/babyagi Accessed 17-01-2024
[3]
[n. d.]. CrewAI: Framework for orchestrating role-playing, autonomous AI agents. https://github.com/joaomdmoura/crewai Accessed 17-01-2024
[4]
[n. d.]. MiniAGI - a minimal general-purpose autonomous agent based on GPT-3.5 / GPT-4. https://github.com/muellerberndt/mini-agi Accessed 17-01-2024
[5]
A. Alford. 2024. OpenAI Releases New Embedding Models and Improved GPT-4 Turbo. https://www.infoq.com/news/2024/02/openai-model-updates/
[6]
S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagappan, B. Nushi, and T. Zimmermann. 2019. Software engineering for machine learning: a case study. In Proceedings of the 41st International Conference on Software Engineering (ICSE-SEIP ’19). IEEE Press, 291–300.
[7]
D. Amodei, C. Olah, J. Steinhardt, P. Christiano, and J. Schulman. 2016. Concrete Problems in AI Safety. arxiv:1606.06565.
[8]
Artificial Intelligence Standards Committee (C/AISC). [n. d.]. P3394: Standard for Large Language Model Agent Interface. https://standards.ieee.org/ieee/3394/11377 Accessed 02-10-2024
[9]
S.H Bach, D. Rodriguez, Y. Liu, C. Luo, and H. Shao. 2019. Snorkel drybell: A case study in deploying weak supervision at industrial scale. In Proc. of Int. Conf. on Management of Data (SIGMOD). 362–375.
[10]
Y. Bai, S. Kadavath, S. Kundu, A. Askell, and J. Kernion. 2022. Constitutional AI: Harmlessness from AI Feedback. arxiv:2212.08073.
[11]
M. Benjamin, P. Gagnon, N. Rostamzadeh, C. Pal, Y. Bengio, and A. Shee. 2019. Towards standardization of data licenses: The montreal data license. arXiv preprint arXiv:1903.12262.
[12]
S. Biegel, R. El-Khatib, L. O. V. B. Oliveira, M. Baak, and N. Aben. 2021. Active weasul: improving weak supervision with active learning. arXiv preprint arXiv:2104.14847.
[13]
R. Bommasani, D.A Hudson, E. Adeli, and Altman. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
[14]
H.B. Braiek and F. Khomh. 2020. On testing machine learning programs. Journal of Systems and Software, 164 (2020), 110542. issn:0164-1212
[15]
D. Brajovic, N. Renner, V.P. Goebels, P. Wagner, and B. Fresz. 2023. Model Reporting for Certifiable AI: A Proposal from Merging EU Regulation into AI Development. arXiv preprint arXiv:2307.11525.
[16]
T. Brown, B. Mann, N. Ryder, M. Subbiah, and J.D. Kaplan. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33 (2020), 1877–1901.
[17]
M. Casey. 2023. LLMS high priority for Enterprise Data Science, but concerns remain. https://snorkel.ai/poll-data-llm-high-priority-enterprise-data-science-concerns-remain Accessed 01-29-2024
[18]
J. Chakraborty, S. Majumder, and T. Menzies. 2021. Bias in machine learning software: why? how? what to do? In Proc. of ACM Joint Meeting on ESEC/FSE. ACM, 429–440.
[19]
K.K Chang, M. Cramer, S. Soni, and D. Bamman. 2023. Speak, memory: An archaeology of books known to chatgpt/gpt-4. arXiv preprint arXiv:2305.00118.
[20]
J. Chen, N. Yoshida, and H. Takada. 2023. An investigation of licensing of datasets for machine learning based on the GQM model. arXiv preprint arXiv:2303.13735.
[21]
L. Chen, M. Zaharia, and J. Zou. 2023. How is ChatGPT’s behavior changing over time? arxiv:2307.09009.
[22]
Y. Cheng, J. Chen, Q. Huang, Z. Xing, and X. Xu. 2023. Prompt Sapper: A LLM-Empowered Production Tool for Building AI Chains. ACM Trans. on Softw. Eng. and Methodology, Dec., issn:1557-7392
[23]
T. Claburn. 2023. GitHub, Microsoft, OpenAI fail to wriggle out of Copilot copyright lawsuit. https://www.theregister.com/2023/05/12/github_microsoft_openai_copilot/ Accessed 01-17-2024
[24]
D. Contractor, D. McDuff, J.K. Haines, J. Lee, and C. Hines. 2022. Behavioral use licensing for responsible AI. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 778–788.
[25]
B. Deiseroth, M. Deb, S. Weinbach, M. Brack, and P. Schramowski. 2023. AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation. arxiv:2301.08110.
[26]
M. Deng, J. Wang, C.-P. Hsieh, Y. Wang, and H. Guo. 2022. RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning. In Proc. of Conf. on EMNLP. ACL, 3369–3391.
[27]
Dynatrace. 2023. What is Observability? Not just logs, metrics and traces. https://www.dynatrace.com/news/blog/what-is-observability-2 Accessed: 01-31-2024
[28]
L. Edwards. 2021. The EU AI Act: a summary of its significance and scope. AI (the EU AI Act), 1 (2021).
[29]
C. Fernando, D. Banarse, H. Michalewski, S. Osindero, and T. Rocktäschel. 2023. Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution. arxiv:2309.16797.
[30]
AI Engineer Foundation. 2023. Agent Protocol. https://agentprotocol.ai Accessed: 01-31-2024
[31]
Linux Foundation. [n. d.]. SPDX Announces 3.0 Release Candidate with New Use Cases. https://www.linuxfoundation.org/press/spdx-sbom-3-release-candidate Accessed 02-07-2024
[32]
Martin Fowler. 2002. Patterns of Enterprise Application Architecture. Addison-Wesley Longman Publishing Co., Inc., USA. isbn:0321127420
[33]
T. Gao, A. Fisch, and D. Chen. 2021. Making Pre-trained Language Models Better Few-shot Learners. In Proc. of Annual Meeting of the ACL-IJCNLP. (ACL), 3816–3830.
[34]
I. Gim, G. Chen, S.S. Lee, N. Sarda, and A. Khandelwal. 2023. Prompt Cache: Modular Attention Reuse for Low-Latency Inference. arXiv preprint arXiv:2311.04934.
[35]
Github. [n. d.]. GitHub Next | Copilot Workspace. https://githubnext.com/projects/copilot-workspace/ Accessed 02-06-2024
[36]
Google. [n. d.]. Permissions on Android. https://developer.android.com/guide/topics/permissions/overview Accessed 02-06-2024
[37]
Grand View Research. 2023. Large Language Model Market Size, Share & Trends Analysis Report By Application (Customer Service, Content Generation), By Deployment, By Industry Vertical, By Region, And Segment Forecasts, 2024 - 2030. https://www.grandviewresearch.com/industry-analysis/large-language-model-llm-market-report
[38]
R. Grosse, J. Bae, C. Anil, N. Elhage, and A. Tamkin. 2023. Studying large language model generalization with influence functions. arXiv preprint arXiv:2308.03296.
[39]
S. Gunasekar, Y. Zhang, J. Aneja, C.C.T. Mendes, and A. Del Giorno. 2023. Textbooks Are All You Need. arXiv preprint arXiv:2306.11644.
[40]
Ł. Górski and S. Ramakrishna. 2023. Challenges in Adapting LLMs for Transparency: Complying with Art. 14 EU AI Act.
[41]
A.E. Hassan, B. Adams, F. Khomh, N. Nagappan, and T. Zimmermann. 2023. FM+SE Vision 2030. https://fmse.io/ Accessed 01-17-2024
[42]
S. Hong, M. Zhuge, J. Chen, X. Zheng, and Y. Cheng. 2023. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. arxiv:2308.00352.
[43]
O. Honovich, U. Shaham, S.R. Bowman, and O. Levy. 2022. Instruction Induction: From Few Examples to Natural Language Task Descriptions. arxiv:2205.10782.
[44]
C.-J. Hsieh, S. Si, F.X. Yu, and I.S. Dhillon. 2023. Automatic Engineering of Long Prompts. arxiv:2311.10117.
[45]
LangChain Inc. [n. d.]. LangSmith. https://www.langchain.com/langsmith Accessed 01-18-2024
[46]
Z. Ji, N. Lee, R. Frieske, T. Yu, and D. Su. [n. d.]. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv., 55, 12 ([n. d.]), Article 248, issn:0360-0300
[47]
A.Q Jiang, A. Sablayrolles, A. Mensch, C. Bamford, and D.S. Chaplot. 2023. Mistral 7B. arXiv preprint arXiv:2310.06825.
[48]
H. Jiang, Q. Wu, C.Y. Lin, Y. Yang, and L. Qiu. 2023. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736.
[49]
N. Kandpal, B. Lester, M. Muqeeth, A. Mascarenhas, and M. Evans. 2023. Git-Theta: a git extension for collaborative development of machine learning models. In Proc. of Int. Conf. on Machine Learning. Article 642, 12 pages.
[50]
O. Khattab, A. Singhvi, P. Maheshwari, Z. Zhang, and K. Santhanam. 2023. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv preprint arXiv:2310.03714.
[51]
F. Khomh, H. Li, M. Lamothe, M.A. Hamdaqa, and J. Cheng. 2023. Software Engineering for Machine Learning Applications (SEMLA) 2023. https://semla.polymtl.ca/ Accessed 01-17-2024
[52]
S. Krishnan, M.J. Franklin, K. Goldberg, and E. Wu. 2017. BoostClean: Automated Error Detection and Repair for Machine Learning. arxiv:1711.01299.
[53]
S. Krishnan, J. Wang, E. Wu, M.J. Franklin, and K. Goldberg. 2016. ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models. arxiv:1601.03797.
[54]
LangChain, Inc. 2024. LangSmith Documentation. https://docs.smith.langchain.com/ Accessed: 01-31-2024
[55]
LangChain, Inc. 2024. Off-the-shelf LangChain Evaluators. https://docs.smith.langchain.com/evaluation/evaluator-implementations Accessed: 01-31-2024
[56]
Langchain Team. 2023. LANGCHAIN BLOG: OpenAI’s Bet on a Cognitive Architecture. https://blog.langchain.dev/openais-bet-on-a-cognitive-architecture/ Accessed: 2024-02-08
[57]
Y. LeCun. 2022. What is the future of AI. https://www.facebook.com/watch/live/?ref=watch_permalink&v=2219848494820560 Accessed: 2024-02-08
[58]
P. Lewis, E. Perez, A. Piktus, F. Petroni, and V. Karpukhin. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 9459–9474.
[59]
Y. Li, L. Meng, L. Chen, L. Yu, and D. Wu. 2022. Training Data Debugging for the Fairness of Machine Learning Software. In Proc. of Int. Conf. on Softw. Eng. (ICSE). 2215–2227.
[60]
Y. Liang, C. Wu, T. Song, W. Wu, and Y. Xia. 2023. TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs. arxiv:2303.16434.
[61]
N.F. Liu, K. Lin, J. Hewitt, A. Paranjape, and M. Bevilacqua. 2023. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
[62]
P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig. 2023. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv., 55, 9 (2023), Article 195.
[63]
Y. Liu. 2023. The importance of human-labeled data in the era of LLMs. arXiv preprint arXiv:2306.14910.
[64]
Y. Liu, D. Iter, Yi. Xu, S. Wang, R. Xu, and C. Zhu. 2023. Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv preprint arXiv:2303.16634.
[65]
Llamaindex. [n. d.]. LlamaIndex (formerly GPT Index) is a data framework for your LLM applications. https://github.com/run-llama/llama_index Accessed 02-07-2024
[66]
S. Longpre, R. Mahari, A. Chen, N. Obeng-Marnu, and D. Sileo. 2023. The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI. arXiv preprint arXiv:2310.16787.
[67]
S. Maekawa, D. Zhang, H. Kim, S. Rahman, and E. Hruschka. 2022. Low-resource interactive active labeling for fine-tuning language models. In Findings of the Assoc. for Computational Linguistics: EMNLP 2022. 3230–3242.
[68]
L. Martie, J. Rosenberg, V. Demers, G. Zhang, and O. Bhardwaj. 2023. Rapid Development of Compositional AI. In Proc. of Int. Conf. Softw. Eng: New Ideas and Emerging Results (ICSE-NIER).
[69]
Microsoft. [n. d.]. microsoft/semantic-kernel: Integrate cutting-edge LLM technology quickly and easily into your apps. https://github.com/microsoft/semantic-kernel Accessed 01-30-2024
[70]
Microsoft. 2023. Prompt flow documentation. https://microsoft.github.io/promptflow Accessed 02-06-2024
[71]
M. Mozes, X. He, B. Kleinberg, and L.D Griffin. 2023. Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. arXiv preprint arXiv:2308.12833.
[72]
V. Murali, C. Maddila, I. Ahmad, M. Bolin, and D. Cheng. 2023. CodeCompose: A Large-Scale Industrial Deployment of AI-assisted Code Authoring. arXiv preprint arXiv:2305.12050.
[73]
A. Nguyen-Duc, B. Cabrero-Daniel, A. Przybylek, C. Arora, and D. Khanna. 2023. Generative Artificial Intelligence for Software Engineering–A Research Agenda. arXiv preprint arXiv:2310.18648.
[74]
NIST. 2021. Improving the Nation’s Cybersecurity: NIST’s Responsibilities Under the May 2021 Executive Order. https://www.nist.gov/itl/executive-order-improving-nations-cybersecurity/software-supply-chain-security-guidance-0 Accessed 02-01-2024
[75]
C. Novelli, F. Casolari, A. Rotolo, M. Taddeo, and L. Floridi. 2023. Taking AI risks seriously: a new assessment model for the AI Act. AI & SOCIETY.
[76]
L. Ouyang, J. Wu, X. Jiang, D. Almeida, and C. Wainwright. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35 (2022), 27730–27744.
[77]
S. Ouyang, J.M. Zhang, M. Harman, and M. Wang. 2023. LLM is Like a Box of Chocolates: the Non-determinism of ChatGPT in Code Generation. arXiv preprint arXiv:2308.02828.
[78]
V. Pamula. 2023. An Introduction to LLMOps: Operationalizing and Managing Large Language Models using Azure ML. https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/an-introduction-to-llmops-operationalizing-and-managing-large/ba-p/3910996 Accessed: 01-31-2024
[79]
C. Parnin, G. Soares, R. Pandita, S. Gulwani, and J. Rich. 2023. Building Your Own Product Copilot: Challenges, Opportunities, and Needs. arxiv:2312.14231.
[80]
O. Parry, G.M. Kapfhammer, M. Hilton, and P. McMinn. 2022. Surveying the developer experience of flaky tests. In Proc. of Int. Conf. on Softw. Eng.: Software Engineering in Practice. 253–262.
[81]
Jenny Preece, Yvonne Rogers, and Helen Sharp. 2002. Interaction Design (1st ed.). John Wiley & Sons, Inc., USA. isbn:0471492787
[82]
Pallets Project. [n. d.]. Jinja. https://palletsprojects.com/p/jinja/ Accessed 02-06-2024
[83]
Y. Qin, S. Liang, Y. Ye, K. Zhu, and L. Yan. 2023. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. arxiv:2307.16789.
[84]
G.K. Rajbahadur, E. Tuck, L. Zi, D. Lin, and B. Chen. 2021. Can I use this publicly available dataset to build commercial AI software?–A Case Study on Publicly Available Image Datasets. arXiv preprint arXiv:2111.02374.
[85]
A. Ratner, S.H Bach, H. Ehrenberg, J. Fries, and S. Wu. 2017. Snorkel: Rapid training data creation with weak supervision. In Proc. of Int. Conf. on Very Large Data Bases (VLDB Endowment).
[86]
J. Robinson and Andrei Voronkov. 2001. Handbook of Automated Reasoning: Volume 1. MIT Press, Cambridge, MA, USA. isbn:0262182211
[87]
T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, and M. Lomeli. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. arxiv:2302.04761.
[88]
S. Schoch, R. Mishra, and Y. Ji. 2023. Data Selection for Fine-tuning Large Language Models Using Transferred Shapley Values. arXiv preprint arXiv:2306.10165.
[89]
W. Shi, A. Ajith, M. Xia, Y. Huang, and D. Liu. 2023. Detecting pretraining data from large language models. arXiv preprint arXiv:2310.16789.
[90]
I. Shumailov, Z. Shumaylov, Y. Zhao, Y. Gal, and N. Papernot. 2023. The curse of recursion: training on generated data makes models forget. Arxiv. Preprint posted online, 27 (2023).
[91]
H. Strobelt, A. Webson, V. Sanh, B. Hoover, and J. Beyer. 2023. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. IEEE Trans. on Visualization and Computer Graphics, 29, 1 (2023), 1146–1156.
[92]
S. Tilga. [n. d.]. LLMs & humans: The perfect duo for data labeling. https://toloka.ai/blog/llms-and-humans-for-data-labeling/ Accessed 01-29-2024
[93]
K. Valmeekam, A. Olmo, S. Sreedharan, and S. Kambhampati. 2022. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). In NeurIPS 2022 Foundation Models for Decision Making Workshop.
[94]
L. Wang, X. Zhang, H. Su, and J. Zhu. 2023. A Comprehensive Survey of Continual Learning: Theory, Method and Application. arxiv:2302.00487.
[95]
Y. Wang, Y. Kordi, S. Mishra, A. Liu, and N.A. Smith. 2023. Self-Instruct: Aligning Language Models with Self-Generated Instructions. arxiv:2212.10560.
[96]
J. Wei, X. Wang, D. Schuurmans, M. Bosma, and E. Chi. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Info Processing Systems.
[97]
W.E. Wong, V. Debroy, A. Surampudi, H. Kim, and M.F. Siok. 2010. Recent Catastrophic Accidents: Investigating How Software was Responsible. In Int. Conf. on Secure Softw. Integration and Reliability Improvement. 14–22.
[98]
FOSSology Workgroup. 2023. FOSSology. https://www.fossology.org/ Accessed 02-07-2024
[99]
OpenDataology workgroup. 2022. OpenDataology. https://github.com/OpenDataology/ Accessed 02-07-2024
[100]
Q. Wu, G. Bansal, J. Zhang, Y. Wu, B. Li, and E. Zhu. 2023. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arxiv:2308.08155.
[101]
T. Wu, E. Jiang, A. Donsbach, J. Gray, and A. Molina. 2022. PromptChainer: Chaining Large Language Model Prompts through Visual Programming. In Extended Abstracts of Conf. on Human Factors in Computing Systems. ACM.
[102]
T. Wu, M. Terry, and C.J. Cai. 2022. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In Proc. of Conf. on Human Factors in Computing Systems (CHI). ACM.
[103]
Z. Wu, L. Qiu, A. Ross, E. Akyürek, and B. Chen. 2023. Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks. arxiv:2307.02477.
[104]
xAI Team. 2023. PromptIDE. https://x.ai/prompt-ide/ Accessed 01-18-2024
[105]
Jiawen Xiong, Yong Shi, Boyuan Chen, Filipe R. Cogo, and Zhen Ming (Jack) Jiang. 2022. Towards build verifiability for Java-based systems. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA. 297–306. isbn:9781450392266 https://doi.org/10.1145/3510457.3513050
[106]
H. Xu, Y. Chen, Y. Du, N. Shao, and W. Yanggang. 2022. GPS: Genetic Prompt Search for Efficient Few-Shot Learning. In Proc. of Conf. on EMNLP. ACL, 8162–8171.
[107]
Q. Xu, F. Hong, B. Li, C. Hu, and Z. Chen. 2023. On the Tool Manipulation Capability of Open-source Large Language Models. arxiv:2305.16504.
[108]
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
[109]
Q. Ye, M. Axmed, R. Pryzant, and F. Khani. 2023. Prompt Engineering a Prompt Engineer. arxiv:2311.05661.
[110]
Y. Yu, L. Kong, J. Zhang, R. Zhang, and C. Zhang. 2022. AcTune: Uncertainty-based active self-training for active fine-tuning of pretrained language models. In Proc. of Conf. of the North American Chapter of the Assoc. for Computational Linguistics: Human Language Technologies. 1422–1436.
[111]
S. Zhang, L. Dong, X. Li, S. Zhang, and X. Sun. 2023. Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792.
[112]
L. Zheng, L. Yin, Z. Xie, J. Huang, and C. Sun. 2023. Efficiently Programming Large Language Models using SGLang. CoRR.
[113]
C. Zhou, P. Liu, P. Xu, S. Iyer, and J. Sun. 2023. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
[114]
W. Zhou, Y.E. Jiang, L. Li, J. Wu, and T. Wang. 2023. Agents: An Open-source Framework for Autonomous Language Agents. arxiv:2309.07870.
[115]
Y. Zhou, A.I. Muresanu, Z. Han, K. Paster, and S. Pitis. 2023. Large Language Models Are Human-Level Prompt Engineers. arxiv:2211.01910.
[116]
X. Zhu, J. Li, Y. Liu, C. Ma, and W. Wang. 2023. A survey on model compression for large language models. arXiv preprint arXiv:2308.07633.

Index Terms

  1. Rethinking Software Engineering in the Era of Foundation Models: A Curated Catalogue of Challenges in the Development of Trustworthy FMware

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering
    July 2024
    715 pages
    ISBN:9798400706585
    DOI:10.1145/3663529
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. AIware
    2. FMware
    3. Foundation models
    4. Large Language Models

    Qualifiers

    • Research-article

    Conference

    FSE '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 112 of 543 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 336
      Total Downloads
    • Downloads (Last 12 months)336
    • Downloads (Last 6 weeks)95
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media