Abstract
Context:
In machine learning (ML) applications, assets include not only the ML models themselves, but also the datasets, algorithms, and deployment tools that are essential in the development, training, and implementation of these models. Efficient management of ML assets is critical to ensure optimal resource utilization, consistent model performance, and a streamlined ML development lifecycle. This practice contributes to faster iterations, adaptability, reduced time from model development to deployment, and the delivery of reliable and timely outputs.
Objective:
Despite research on ML asset management, there is still a significant knowledge gap on operational challenges, such as model versioning, data traceability, and collaboration issues, faced by asset management tool users. These challenges are crucial because they could directly impact the efficiency, reproducibility, and overall success of machine learning projects. Our study aims to bridge this empirical gap by analyzing user experience, feedback, and needs from Q &A posts, shedding light on the real-world challenges they face and the solutions they have found.
Method:
We examine 15, 065 Q &A posts from multiple developer discussion platforms, including Stack Overflow, tool-specific forums, and GitHub/GitLab. Using a mixed-method approach, we classify the posts into knowledge inquiries and problem inquiries. We then apply BERTopic to extract challenge topics and compare their prevalence. Finally, we use the open card sorting approach to summarize solutions from solved inquiries, then cluster them with BERTopic, and analyze the relationship between challenges and solutions.
Results:
We identify 133 distinct topics in ML asset management-related inquiries, grouped into 16 macro-topics, with software environment and dependency, model deployment and service, and model creation and training emerging as the most discussed. Additionally, we identify 79 distinct solution topics, classified under 18 macro-topics, with software environment and dependency, feature and component development, and file and directory management as the most proposed.
Conclusions:
This study highlights critical areas within ML asset management that need further exploration, particularly around prevalent macro-topics identified as pain points for ML practitioners, emphasizing the need for collaborative efforts between academia, industry, and the broader research community.
Similar content being viewed by others
Data Availability Statement
The datasets generated and analyzed during this study are available in the replication package (https://github.com/zhimin-z/Asset-Management-Topic-Modeling. https://github.com/zhimin-z/MSR-Asset-Management, https://github.com/zhimin-z/QA-Asset-Management).
Notes
References
Agrawal N, Bolosky WJ, Douceur JR, Lorch JR (2007) A five-year study of file-system metadata. ACM Trans Storage (TOS) 3(3):9–es
Aguilar Melgar, L., Dao, D., Gan, S., Gürel, N.M., Hollenstein, N., Jiang, J., Karlaš, B., Lemmin, T., Li, T., Li, Y., et al.: Ease. ml: a lifecycle management system for machine learning. In: Proceedings of the Annual Conference on Innovative Data Systems Research (CIDR), 2021. CIDR (2021)
Ahmed S, Bagherzadeh M (2018) What do concurrency developers ask about?: a large-scale study using stack overflow. Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement (2018)
Alberti M, Pondenkandath V, Würsch M, Ingold R, Liwicki M (2018) Deepdiva: a highly-functional python framework for reproducible experiments. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR). IEEE, pp 423–428
Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: A case study. In: 2019 IEEE/ACM 41st International conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 291–300
Bagherzadeh M, Khatchadourian R (2019) Going big: a large-scale study on what big data developers ask. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 432–442
Bahrampour S, Ramakrishnan N, Schott L, Shah M (2015) Comparative study of deep learning software frameworks. arXiv:1511.06435
Baier L, Jöhren F, Seebacher S (2019) Challenges in the deployment and operation of machine learning in practice. In: ECIS, vol. 1
Barde BV, Bainwad AM (2017) An overview of topic modeling methods and tools. In: 2017 International conference on intelligent computing and control systems (ICICCS). IEEE, pp 745–750
Barrak A, Eghan EE, Adams B (2021) On the co-evolution of ml pipelines and source code-empirical study of dvc projects. In: 2021 IEEE International conference on software analysis, evolution and reengineering (SANER). IEEE, pp 422–433
Belguidoum M, Dagnat F (2007) Dependency management in software component deployment. Electron Notes Theor Comput Sci 182:17–32
Benítez-Hidalgo A, Barba-González C, García-Nieto J, Gutiérrez-Moncayo P, Paneque M, Nebro AJ, del Mar Roldán-García M, Aldana-Montes JF, Navas-Delgado I (2021) Titan: A knowledge-based platform for big data workflow management. Knowledge-Based Systems 232:107489
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc: Ser B (Methodol) 57(1):289–300
Bhattacharjee A, Barve Y, Khare S, Bao S, Gokhale A, Damiano T (2019) Stratum: A serverless framework for the lifecycle management of machine learning-based data analytics tasks. In: 2019 USENIX Conference on Operational Machine Learning (OpML 19), pp 59–61
Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E et al (2021) On the opportunities and risks of foundation models. arXiv:2108.07258
Borges H, Valente MT (2018) What’s in a github star? understanding repository starring practices in a social coding platform. J Syst Softw 146:112–129
Bravo-Rocca G, Liu P, Guitart J, Dholakia A, Ellison D, Falkanger J, Hodak M (2022) Scanflow: A multi-graph framework for machine learning workflow management, supervision, and debugging. Expert Syst Appl 202:117232
Campbell JL, Quincy C, Osserman J, Pedersen OK (2013) Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociol Methods Res 42(3):294–320
Chard R, Li Z, Chard K, Ward L, Babuji Y, Woodard A, Tuecke S, Blaiszik B, Franklin MJ, Foster I (2019) Dlhub: Model and data serving for science. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 283–292
Chen Z, Cao Y, Liu Y, Wang H, Xie T, Liu X (2020) A comprehensive study on challenges in deploying deep learning based software. In: Proceedings of the 28th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 750–762
Chen A, Chow A, Davidson A, DCunha A, Ghodsi A, Hong SA, Konwinski A, Mewald C, Murching S, Nykodym T et al (2020) Developments in mlflow: A system to accelerate the machine learning lifecycle. In: Proceedings of the fourth international workshop on data management for end-to-end machine learning, pp 1–4
Chen Y, Fernandes E, Adams B, Hassan AE (2023) On practitioners’ concerns when adopting service mesh frameworks. Empir Softw Eng
Cheng L, Li X, Bing L (2023) Is gpt-4 a good data analyst? arXiv:2305.15038
Coelho J, Valente MT (2017) Why modern open source projects fail. In: Proceedings of the 2017 11th Joint meeting on foundations of software engineering, pp 186–196
Cramér H (1999) Mathematical methods of statistics, vol. 43. Princeton university press
Diamantopoulos T, Nastos DN, Symeonidis A (2023) Semantically-enriched jira issue tracking data. In: 2023 IEEE/ACM 20th International conference on mining software repositories (MSR). IEEE, pp 218–222
do Prado KS (2020) Kelvins: awesome-mlops: A curated list of awesome mlops tools. https://github.com/kelvins/awesome-mlops
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
Enck W, Williams L (2022) Top five challenges in software supply chain security: Observations from 30 industry and government organizations. IEEE Secur Privacy 20(2):96–100
Esparrachiari S, Reilly T, Rentz A (2018) Tracking and controlling microservice dependencies: Dependency management is a crucial part of system and software design. Queue 16(4):44–65
Ferenc R, Viszkok T, Aladics T, Jász J, Hegedűs P (2020) Deep-water framework: The swiss army knife of humans working with machine learning models. SoftwareX 12:100551
Françoise J, Caramiaux B, Sanchez T (2021) Marcelle: composing interactive machine learning workflows and interfaces. In: The 34th Annual ACM symposium on user interface software and technology, pp 39–53
Garcia R, Sreekanti V, Yadwadkar N, Crankshaw D, Gonzalez JE, Hellerstein JM (2018) Context: The missing piece in the machine learning lifecycle. In: KDD CMI Workshop, vol. 114, pp 1–4
Gao C (2022) Tensorchord: awesome-llmops: An awesome curated list of best llmops tools for developers. https://github.com/tensorchord/Awesome-LLMOps
Gharibi G, Walunj V, Alanazi R, Rella S, Lee Y (2019) Automated management of deep learning experiments. In: Proceedings of the 3rd International workshop on data management for end-to-end machine learning, pp 1–4
Gilardi F, Alizadeh M, Kubli M (2023) Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv:2303.15056
Giray G (2021) A software engineering perspective on engineering machine learning systems: State of the art and challenges. J Syst Softw 180:111031
Goniwada SR, Goniwada SR (2022) Observability. Cloud native architecture and design: a handbook for modern day architecture and design with enterprise-grade examples pp 661–676
Gou J, Yu B, Maybank SJ, Tao D (2021) Knowledge distillation: A survey. Int J Comput Vision 129:1789–1819
Groeneveld D, Beltagy I, Walsh P, Bhagia A, Kinney R, Tafjord O, Jha AH, Ivison H, Magnusson I, Wang Y et al (2024) Olmo: Accelerating the science of language models. arXiv:2402.00838
Grootendorst M (2022) Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv:2203.05794
Grubb P, Takang AA (2003) Software maintenance: concepts and practice. World Scientific
Gu H, He H, Zhou M (2023) Self-admitted library migrations in java, javascript, and python packaging ecosystems: A comparative study. In: 2023 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 627–638
Hartley M, Olsson TS (2020) dtoolai: Reproducibility for deep learning. Patterns 1(5)
Hastie T, Tibshirani R, Friedman JH, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer
Hewage N, Meedeniya D (2022) Machine learning operations: A survey on mlops tool support. arXiv:2202.10169
Hummer W, Muthusamy V, Rausch T, Dube P, El Maghraoui K, Murthi A, Oum P (2019) Modelops: Cloud-based lifecycle management for reliable and trusted ai. In: 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE, pp 113–120
Idowu S, Strüber D, Berger T (2022) Asset management in machine learning: State-of-research and state-of-practice. ACM Comput Surv. https://doi.org/10.1145/3543847. Just Accepted
Idowu S, Strüber D, Berger T (2022) Emmm: A unified meta-model for tracking machine learning experiments. In: 2022 48th Euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 48–55
Isah H, Abughofa T, Mahfuz S, Ajerla D, Zulkernine F, Khan S (2019) A survey of distributed data stream processing frameworks. IEEE Access 7:154300–154316
Izquierdo JLC, Cosentino V, Cabot J (2017) An empirical study on the maturity of the eclipse modeling ecosystem. In: 2017 ACM/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (MODELS). IEEE, pp 292–302
Jalali S, Wohlin C (2012) Systematic literature studies: database searches vs. backward snowballing. In: Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement, pp 29–38
Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Chaplot DS, Casas Ddl, Bressand F, Lengyel G, Lample G, Saulnier L et al (2023) Mistral 7b. arXiv:2310.06825
Jiang W, Synovic N, Hyatt M, Schorlemmer TR, Sethi R, Lu YH, Thiruvathukal GK, Davis JC (2023) An empirical study of pre-trained model reuse in the hugging face deep learning model registry. arXiv:2303.02552
Khondhu J, Capiluppi A, Stol KJ (2013) Is it all lost? a study of inactive open source projects. In: Open source software: quality verification: 9th IFIP WG 2.13 International conference, OSS 2013, Koper-Capodistria, Slovenia, June 25-28, 2013. Proceedings 9. Springer, pp 61–79
Kitchenham BA, Travassos GH, Von Mayrhauser A, Niessink F, Schneidewind NF, Singer J, Takada S, Vehvilainen R, Yang H (1999) Towards an ontology of software maintenance. J Softw Maintenance: Res Pract 11(6):365–389
Klaise J, Van Looveren A, Cox C, Vacanti G, Coca A (2020) Monitoring and explainability of models in production. arXiv:2007.06299
Kreutz D, Ramos FM, Verissimo PE, Rothenberg CE, Azodolmolky S, Uhlig S (2014) Software-defined networking: A comprehensive survey. Proc of the IEEE 103(1):14–76
Kumar A, Boehm M, Yang J (2017) Data management in machine learning: Challenges, techniques, and systems. In: Proceedings of the 2017 ACM International conference on management of data, pp 1717–1722
Lapan M (2018) Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients. Packt Publishing Ltd, AlphaGo Zero and more, TRPO
Le VD (2023) Veml: An end-to-end machine learning lifecycle for large-scale and high-dimensional data. arXiv:2304.13037
Liu A, Han X, Wang Y, Tsvetkov Y, Choi Y, Smith NA (2024) Tuning language models by proxy. arXiv:2401.08565
Liu Y, Iter D, Xu Y, Wang S, Xu R, Zhu C (2023) Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv:2303.16634
Loeliger J, McCullough M (2012) Version Control with Git: Powerful tools and techniques for collaborative software development. " O’Reilly Media, Inc."
Lu L, Arpaci-Dusseau AC, Arpaci-Dusseau RH, Lu S (2013) A study of linux file system evolution. In: 11th USENIX Conference on file and storage technologies (FAST 13), pp 31–44
Manvi SS, Shyam GK (2014) Resource management for infrastructure as a service (iaas) in cloud computing: A survey. J Netw Comput Appl 41:424–440
McHugh ML (2012) Interrater reliability: the kappa statistic. Biochem Med 22(3):276–282
McInnes L, Healy J, Astels S (2017) hdbscan: Hierarchical density based clustering. J Open Source Softw 2(11):205
McKinney W et al (2011) pandas: a foundational python library for data analysis and statistics. Python high Perform Sci Comput 14(9):1–9
Melin PD (2023) Tackling version management and reproducibility in mlops
Mens T, Goeminne M, Raja U, Serebrenik A (2014) Survivability of software projects in gnome–a replication study. In: 7th International seminar series on advanced techniques & tools for software evolution (SATToSE), pp 79–82
Miao H, Chavan A, Deshpande A (2017) Provdb: Lifecycle management of collaborative analysis workflows. In: Proceedings of the 2nd workshop on human-in-the-loop data analytics, pp 1–6
Miao H, Li A, Davis LS, Deshpande A (2017) Modelhub: Deep learning lifecycle management. In: 2017 IEEE 33rd International conference on data engineering (ICDE). IEEE, pp 1393–1394
Miao H, Li A, Davis LS, Deshpande A (2017) Towards unified data and lifecycle management for deep learning. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, pp 571–582
Miotto R, Wang F, Wang S, Jiang X, Dudley JT (2018) Deep learning for healthcare: review, opportunities and challenges. Briefings Bioinf 19(6):1236–1246
Moreno M, Lourenço V, Fiorini SR, Costa P, Brandão R, Civitarese D, Cerqueira R (2020) Managing machine learning workflow components. Int J Sem Comput 14(02):295–309
Moreschi S, Recupito G, Lenarduzzi V, Palomba F, Hastbacka D, Taibi D (2023) Toward end-to-end mlops tools map: A preliminary study based on a multivocal literature review. arXiv:2304.03254
Munappy AR, Bosch J, Olsson HH, Arpteg A, Brinne B (2022) Data management for production quality deep learning models: Challenges and solutions. J Syst Softw 191:111359
Mustafa S, Nazir B, Hayat A, Madani SA et al (2015) Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng 47:186–203
Nagy AM, Simon V (2018) Survey on traffic prediction in smart cities. Pervasive Mobile Comput 50:148–163
Namaki MH, Floratou A, Psallidas F, Krishnan S, Agrawal A, Wu Y (2020) Vamsa: Tracking provenance in data science scripts. arXiv:2001.01861
Nguyen G, Dlugolinsky S, Bobák M, Tran V, López García Á, Heredia I, Malík P, Hluchỳ L (2019) Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev 52:77–124
Openja M, Adams B, Khomh F (2020) Analysis of modern release engineering topics: A large-scale study using stackoverflow. In: Proceedings of the 36th International conference on software maintenance and evolution (ICSME), pp 104–114
Paleyes A, Urma RG, Lawrence ND (2022) Challenges in deploying machine learning: a survey of case studies. ACM Comput Surv 55(6):1–29
Parra E, Alahmadi M, Ellis A, Haiduc S (2022) A comparative study and analysis of developer communications on slack and gitter. Empir Softw Eng 27(2):40
Pavao A, Guyon I, Letournel AC, Baró X, Escalante H, Escalera S, Thomas T, Xu Z (2022) Codalab competitions: An open source platform to organize scientific challenges. Ph.D. thesis, Université Paris-Saclay, FRA. (2022)
Pearson K (1900) X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philos Mag J Sci 50(302):157–175
Peili Y, Xuezhen Y, Jian Y, Lingfeng Y, Hui Z, Jimin L (2018) Deep learning model management for coronary heart disease early warning research. In: 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA). IEEE, pp 552–557
Polyzotis N, Roy S, Whang SE, Zinkevich M (2018) Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Record 47(2):17–28
Recupito G, Pecorelli F, Catolino G, Moreschini S, Di Nucci D, Palomba F, Tamburri DA (2022) A multivocal literature review of mlops tools and features. In: 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, pp 84–91
Rigby PC, Barr ET, Bird C, German DM, Devanbu P (2009) Collaboration and governance with distributed version control. ACM Trans Software Engineering and Methodology, Submission number TOSEM-2009-0087 p 33
Rochkind MJ (1975) The source code control system. IEEE Trans Softw Eng 4:364–370
Rosen C, Shihab E (2016) What are mobile developers asking about? a large scale study using stack overflow. Empir Softw Eng 21:1192–1223
Ruf P, Madan M, Reich C, Ould-Abdeslam D (2021) Demystifying mlops and presenting a recipe for the selection of open-source tools. Appl Sci 11(19):8861
Sallou J, Durieux T, Panichella A (2024) Breaking the silence: the threats of using llms in software engineering. In: ACM/IEEE 46th International conference on software engineering. ACM/IEEE
Saucedo A (2018) EthicalML: awesome-production-machine-learning: A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning. https://github.com/EthicalML/awesome-production-machine-learning
Schelter S, Biessmann F, Januschowski T, Salinas D, Seufert S, Szarvas G (2015) On challenges in machine learning model management
Schelter S, Böse JH, Kirschnick J, Klein T, Seufert S (2018) Declarative metadata management: A missing piece in end-to-end machine learning
Schick T, Schütze H (2020) It’s not just size that matters: Small language models are also few-shot learners. arXiv:2009.07118
Schlegel M, Sattler KU (2023) Management of machine learning lifecycle artifacts: A survey. ACM SIGMOD Record 51(4):18–35
Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, Chaudhary V, Young M, Crespo JF, Dennison D (2015) Hidden technical debt in machine learning systems. Advances in neural information processing systems 28
Soomro ZA, Shah MH, Ahmed J (2016) Information security management needs more holistic approach: A literature review. Int J Inf Manag 36(2):215–225
Sorokin A, Forsyth D (2008) Utility data annotation with amazon mechanical turk. In: 2008 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, pp 1–8
Squire M (2015) "should we move to stack overflow?" measuring the utility of social media for developer support. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2. IEEE, pp 219–228
Storey JD (2002) A direct approach to false discovery rates. J Royal Stat Soc Ser B: Stat Methodol 64(3):479–498
Sun C, Azari N, Turakhia C (2020) Gallery: A machine learning model management system at uber. In: EDBT, vol. 20, pp 474–485
Sung N, Kim M, Jo H, Yang Y, Kim J, Lausen L, Kim Y, Lee G, Kwak D, Ha JW et al (2017) Nsml: A machine learning platform that enables you to focus on your models. arXiv:1712.05902
Syed S, Spruit M (2017) Full-text or abstract? examining topic coherence scores using latent dirichlet allocation. In: 2017 IEEE International conference on data science and advanced analytics (DSAA). IEEE, pp 165–174
Symeonidis G, Nerantzis E, Kazakis A, Papakostas GA (2022) Mlops-definitions, tools and challenges. In: 2022 IEEE 12th Annual computing and communication workshop and conference (CCWC). IEEE, pp 0453–0460
Tao L, Cazan AP, Ibraimoski S, Moran S (2023) Code librarian: A software package recommendation system. In: 2023 IEEE/ACM 45th International conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 196–198
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava P, Bhosale S et al (2023) Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288
Treude C, Barzilay O, Storey MA (2011) How do programmers ask and answer questions on the web?(nier track). In: Proceedings of the 33rd international conference on software engineering, pp 804–807
Tsay J, Mummert T, Bobroff N, Braz A, Westerink P, Hirzel M (2018) Runway: machine learning model experiment management tool. In: Conference on systems and machine learning (sysML)
Vadlamani SL, Baysal O (2020) Studying software developer expertise and contributions in stack overflow and github. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 312–323
Vartak M, Madden S (2018) Modeldb: Opportunities and challenges in managing machine learning models. IEEE Data Eng Bull 41(4):16–25
Vasilescu B, Filkov V, Serebrenik A (2013) Stackoverflow and github: Associations between software development and crowdsourced knowledge. In: 2013 International conference on social computing. IEEE, pp 188–195
Venkatesh PK, Wang S, Zhang F, Zou Y, Hassan AE (2016) What do client developers concern when using web apis? an empirical study on developer forums and stack overflow. In: 2016 IEEE International Conference on Web Services (ICWS). IEEE, pp 131–138
Wang Z, Liu K, Li J, Zhu Y, Zhang Y (2019) Various frameworks and libraries of machine learning and deep learning: a survey. Archives of computational methods in engineering pp 1–24
Werlinger R, Hawkey K, Beznosov K (2009) An integrated view of human, organizational, and technological challenges of it security management. Inf Manag Comput Secur 17(1):4–19
Wood JR, Wood LE (2008) Card sorting: current practices and beyond. J Usability Studies 4(1):1–6
Wozniak JM, Jain R, Balaprakash P, Ozik J, Collier NT, Bauer J, Xia F, Brettin T, Stevens R, Mohd-Yusof J et al (2018) Candle/supervisor: A workflow framework for machine learning applied to cancer research. BMC Bioinf 19(18):59–69
Xia W, Wen Y, Foh CH, Niyato D, Xie H (2014) A survey on software-defined networking. IEEE Commun Surv Tutor 17(1):27–51
Xin D, Miao H, Parameswaran A, Polyzotis N (2021) Production machine learning pipelines: Empirical analysis and optimization opportunities. In: Proceedings of the 2021 international conference on management of data, pp 2639–2652
Xiu M, Jiang ZMJ, Adams B (2020) An exploratory study of machine learning model stores. IEEE Software 38(1):114–122
Yang X, Lo D, Xia X, Wan Z, Sun J (2016) What security questions do developers ask? a large-scale study of stack overflow posts. J Comput Sci Technol 31:910–924
Yang C, Wang W, Zhang Y, Zhang Z, Shen L, Li Y, See J (2021) Mlife: A lite framework for machine learning lifecycle initialization. Mach Learn 110:2993–3013
Yao Y, Duan J, Xu K, Cai Y, Sun E, Zhang Y (2023) A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. arXiv:2312.02003
Zaharia M, Chen A, Davidson A, Ghodsi A, Hong SA, Konwinski A, Murching S, Nykodym T, Ogilvie P, Parkhe M et al (2018) Accelerating the machine learning lifecycle with mlflow. IEEE Data Eng Bull 41(4):39–45
Zhang S, Dong L, Li X, Zhang S, Sun X, Wang S, Li J, Hu R, Zhang T, Wu F et al (2023) Instruction tuning for large language models: A survey. arXiv:2308.10792
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by: Massimiliano Di Penta.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Definition and illustration of solution macro-topics not discussed in Section 5.1
\(\hat{R}_{01}\) Code Development | |
Definition: Same as \(\hat{C}_{01}\). It integrates argument management (\(R_{12}\)), code modification (\(R_{13}\)), wait time management (\(R_{27}\)), command line usage (\(R_{28}\)), API integration (\(R_{47}\)), parameter update (\(R_{56}\)), character removal (\(R_{57}\)), troubleshooting guidance (\(R_{58}\)), function modification (\(R_{59}\)), syntax update (\(R_{61}\)), exception handling (\(R_{63}\)) and parameter removal (\(R_{84}\)) to facilitate a controlled, efficient, and error-resistant MLOps environment. | |
Example: The following exampleFootnote 70 suggests checking the “Action” field when “Use Action Name” is selected in API methods. | |
\(E_{01}\): Accepted Answer: Check in all your API methods that you have not specified “Use Action Name” for any integration request and then leave the “Action” field blank. [TEXT] | |
\(\hat{R}_{02}\) Code Management | |
Definition: Same as \(\hat{C}_{02}\). It encompasses the creation or updating of Git repositories (\(R_{43}\)) for optimal version control and collaboration. | |
Example: The following exampleFootnote 71 illustrates adding the relative path to write to the output folder. | |
\(E_{02}\): Accepted Answer: [TEXT] Here is an example of an operation that reads from the relative path where your code exists: [CODE] You can then join the relative path of your git folder before your output. [TEXT] | |
\(\hat{R}_{03}\) Computation Management | |
Definition: Same as \(\hat{C}_{03}\). It includes regulating resource usage through limit adjustments (\(R_{19}\)), facilitating functionalities through service provisioning (\(R_{30}\)), overseeing the creation and handling of compute instances or clusters for task execution (\(R_{35}\)), managing event-driven programming through lambda function management (\(R_{53}\)), and improving performance by increasing resource capacities (\(R_{60}\)). | |
Example: The following exampleFootnote 72 suggests increasing GPU memory, decreasing batch size, and changing to a smaller model. | |
\(E_{03}\): Accepted Answer: [TEXT] Things that you can try: Provision an instance with more GPU memory; Decrease batch size; Use a different (smaller) model. | |
\(\hat{R}_{04}\) Data Development | |
Definition: Same as \(\hat{C}_{04}\). It covers column manipulation (\(R_{31}\)), feature filtering (\(R_{33}\)), and data transformation (\(R_{54}\)) to enhance and refine data for an effective ML pipeline. | |
Example: The following exampleFootnote 73 illustrates the usage of a feature store. | |
\(E_{04}\): Accepted Answer: [TEXT] Once a feature store is created, you will need to create an entity and then create a feature that has the labels parameter as shown in the below sample Python code. [CODE] | |
\(\hat{R}_{05}\) Data Management | |
Definition: Same as \(\hat{C}_{05}\). It encompasses the conversion of data and datatypes for compatibility and processing (\(R_{40}\), \(R_{70}\)), the creation of datasets for model training and validation (\(R_{42}\)), facilitating seamless data import/export for access and sharing (\(R_{75}\)), and the manipulation of buckets for organized storage in cloud services (\(R_{77}\)). | |
Example: The following exampleFootnote 74 suggests examination of the output location in Amazon S3. | |
\(E_{05}\): Accepted Answer: [TEXT] SageMaker places the model artifacts in a bucket that you own, check the S3 output location in the AWS SageMaker console. [TEXT] | |
\(\hat{R}_{06}\) Environment Management | |
Definition: Same as \(\hat{C}_{06}\). It encompasses package upgrades (\(R_{01}\)), installation (\(R_{05}\)), version management (\(R_{07}\)), SDK upgrades (\(R_{15}\)), container customization (\(R_{21}\)), Docker management (\(R_{22}\)), package additions (\(R_{23}\)), creation of environments (\(R_{25}\)), management of environment variables (\(R_{34}\)), SDK usage (\(R_{37}\)), package downgrades (\(R_{41}\)), reinstallations (\(R_{44}\)), removals (\(R_{62}\)), notebook usage (\(R_{66}\)), Python version management (\(R_{67}\)), package imports (\(R_{69}\)), workspace creation (\(R_{72}\)), region support (\(R_{73}\)), kernel restarts (\(R_{76}\)), Docker updates (\(R_{78}\)) and notebook instance management (\(R_{85}\)). | |
Example: The following exampleFootnote 75 suggests the prohibition of circular dependency. | |
\(E_{06}\): Merge Request: Fixes #105 by not allowing circullar dependency on mlflow. | |
\(\hat{R}_{07}\) Experiment Management | |
Definition: Same as $$\hat{C}_{07}$$ C ^ 07 . It encapsulates the concepts of specifying run settings ( $$R_{11}$$ R 11 ), creating or updating ML experiments ( $$R_{39}$$ R 39 ), providing templates for machine learning tasks ( $$R_{71}$$ R 71 ) and tailoring sessions for task execution ( $$R_{83}$$ R 83 ). Example: The following exampleFootnote 76 suggests adding information from the experiment run. | |
\(E_{07}\): Merge Request: Fix #6745: adds additional information about the run, as in the native API. [TEXT] | |
\(\hat{R}_{08}\) File Management | |
Definition: Same as \(\hat{C}_{08}\). It encompasses the processes of storage mounting(\(R_{16}\)), directory management(\(R_{20}\)), file deletion(\(R_{29}\)), file download(\(R_{36}\)), filepath update(\(R_{49}\)), input management(\(R_{55}\)), filepath modification(\(R_{64}\)), file load(\(R_{65}\)), tracking configuration(\(R_{68}\)), and documentation update(\(R_{86}\)). | |
Example: The following exampleFootnote 77 suggests copying files from Amazon S3 to the local drive. | |
\(E_{08}\): Accepted Answer: [TEXT] The simplest option is to copy the files from S3 to the local drive (EBS or EFS) of the notebook instance: [CODE] | |
\(\hat{R}_{09}\) Model Deployment | |
Definition: Same as \(\hat{C}_{09}\). It involves endpoint invocation (\(R_{24}\)), deployment pipeline creation (\(R_{26}\)), model prediction (\(R_{38}\)), model deployment (\(R_{79}\)) and implementation of the inference pipeline (\(R_{80}\)) to efficiently deploy and serve machine learning models in the production environment. | |
Example: The following exampleFootnote 78 suggests the usage of undeploy_all function. | |
\(E_{09}\): Accepted Answer: You can undeploy all the models from an endpoint by calling the method undeploy_all() [TEXT] | |
\(\hat{R}_{10}\) Model Development | |
Definition: Same as \(\hat{C}_{10}\). It encompasses the practice of distributed training (\(R_{03}\)), which focuses on the implementation and configuration of parallel or continuous training processes for model creation. | |
Example: The following exampleFootnote 79 suggests creating a custom Docker image and uploading it to Azure Container Registry. | |
\(E_{12}\): Solution Comment: Got this working by creating a custom Docker image and putting it to the ACR tied to Azure ML workspace. [TEXT] | |
\(\hat{R}_{11}\) Model Management | |
Definition: Same as \(\hat{C}_{11}\). It integrates the processes of model creation (\(R_{10}\)), registration (\(R_{32}\)), and file handling (\(R_{52}\)) to streamline the lifecycle of machine learning models. | |
Example: The following exampleFootnote 80 suggests replacing load_model with pickle. | |
\(E_{11}\): Accepted Answer: [TEXT] In particular, this answer using load_model instead of pickle seemed to work well for me: [CODE] | |
\(\hat{R}_{12}\) Network Management | |
Definition: Same as \(\hat{C}_{12}\). It encompasses both the configuration or alteration of network settings to improve system performance (\(R_{09}\)) and the process of updating or configuring hyperlinks for precise navigation (\(R_{51}\)). | |
Example: The following exampleFootnote 81 suggests creating an API gateway to share model as an endpoint. | |
\(E_{12}\): Accepted Answer: To share your model as an endpoint, you should use lambda and API gateway to create your API. [TEXT] | |
\(\hat{R}_{13}\) Observability Management | |
Definition: Same as \(\hat{C}_{13}\). It encompasses the setup of logging systems(\(R_{06}\)), the updating or scrutiny of performance metrics(\(R_{50}\)), and the enhancement of log functions or levels for better debugging(\(R_{74}\)). | |
Example: The following exampleFootnote 82 suggests checking the metrics tab of the current step. | |
\(E_{13}\): Accepted Answer: [TEXT] In Studio, if you go to the step’s Metrics tab, you will be able to see a chart/table of execution progress, including remaining items, remaining mini batches, failed items, etc. [URL] | |
\(\hat{R}_{14}\) Pipeline Management | |
Definition: Same as \(\hat{C}_{14}\). It encompasses the cohesive administration and orchestration of job processing (\(R_{14}\)), where tasks are executed potentially in parallel or on a schedule, combined with pipeline configuration (\(R_{18}\)) that involves the creation, updating or modification of pipelines for efficient data and model workflow, and lifecycle configuration (\(R_{45}\)), which denotes the management of application states through the implementation or modification of lifecycle scripts. | |
Example: The following exampleFootnote 83 clarifies the usage of the pipeline construct. | |
\(E_{14}\): Accepted Answer: CDK L1 Constructs correspond 1:1 to a CloudFormation resource of the same name. The construct props match the resource properties. Therefore, the go-to source is the CloudFormation documentation. | |
\(\hat{R}_{15}\) Security Management | |
Definition: Same as \(\hat{C}_{15}\). It integrates the processes of establishing, allocating, or altering access permissions (\(R_{04}\)), updating authentication credentials (\(R_{17}\)), and registering or recreating user accounts (\(R_{81}\)) to improve access control. | |
Example: The following exampleFootnote 84 suggests ungrading the workspace to migrate from V1 to V2. | |
\(E_{15}\): Accepted Answer: [TEXT] To migrate from Azure Machine Learning V1 to V2, you need to upgrade the az ml workspace share commands to equivalent az role assignment create commands. [TEXT] | |
\(\hat{R}_{16}\) User Interface Management | |
Definition: Same as \(\hat{C}_{16}\). It includes the creation and modification of data visualizations (\(R_{48}\)) for improved data analysis and interpretation. | |
Example: The following exampleFootnote 85 clarifies the support for the Vega visualization language. | |
\(E_{16}\): Accepted Answer: [TEXT] currently Vega only powers custom charts and the underlying code for other panels are created in JavaScript, unfortunately. |
Appendix B
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, Z., Chen, Y., Bangash, A. et al. An empirical study of challenges in machine learning asset management. Empir Software Eng 29, 98 (2024). https://doi.org/10.1007/s10664-024-10474-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-024-10474-4