[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

An empirical study of challenges in machine learning asset management

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context:

In machine learning (ML) applications, assets include not only the ML models themselves, but also the datasets, algorithms, and deployment tools that are essential in the development, training, and implementation of these models. Efficient management of ML assets is critical to ensure optimal resource utilization, consistent model performance, and a streamlined ML development lifecycle. This practice contributes to faster iterations, adaptability, reduced time from model development to deployment, and the delivery of reliable and timely outputs.

Objective:

Despite research on ML asset management, there is still a significant knowledge gap on operational challenges, such as model versioning, data traceability, and collaboration issues, faced by asset management tool users. These challenges are crucial because they could directly impact the efficiency, reproducibility, and overall success of machine learning projects. Our study aims to bridge this empirical gap by analyzing user experience, feedback, and needs from Q &A posts, shedding light on the real-world challenges they face and the solutions they have found.

Method:

We examine 15, 065 Q &A posts from multiple developer discussion platforms, including Stack Overflow, tool-specific forums, and GitHub/GitLab. Using a mixed-method approach, we classify the posts into knowledge inquiries and problem inquiries. We then apply BERTopic to extract challenge topics and compare their prevalence. Finally, we use the open card sorting approach to summarize solutions from solved inquiries, then cluster them with BERTopic, and analyze the relationship between challenges and solutions.

Results:

We identify 133 distinct topics in ML asset management-related inquiries, grouped into 16 macro-topics, with software environment and dependency, model deployment and service, and model creation and training emerging as the most discussed. Additionally, we identify 79 distinct solution topics, classified under 18 macro-topics, with software environment and dependency, feature and component development, and file and directory management as the most proposed.

Conclusions:

This study highlights critical areas within ML asset management that need further exploration, particularly around prevalent macro-topics identified as pain points for ML practitioners, emphasizing the need for collaborative efforts between academia, industry, and the broader research community.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability Statement

The datasets generated and analyzed during this study are available in the replication package (https://github.com/zhimin-z/Asset-Management-Topic-Modeling. https://github.com/zhimin-z/MSR-Asset-Management, https://github.com/zhimin-z/QA-Asset-Management).

Notes

  1. https://aimstack.io

  2. https://clear.ml

  3. https://dvc.org

  4. https://kedro.org

  5. https://polyaxon.com

  6. https://github.com/MaartenGr/BERTopic

  7. https://aimstack.io

  8. https://clear.ml

  9. https://cnvrg.io

  10. https://comet.com

  11. https://domino.ai

  12. https://dvc.org

  13. https://kedro.org

  14. https://polyaxon.com

  15. https://wandb.ai

  16. https://github.com/topics/awesome

  17. https://scholar.google.com

  18. https://venturebeat.com/dev/facebook-details-its-company-wide-machine-learning-platform-fblearner-flow

  19. https://www.datanami.com/this-just-in/dotscience-is-shutting-down

  20. https://github.com/dolthub/dolt

  21. https://github.com/pachyderm/pachyderm

  22. https://stackoverflow.com

  23. https://data.stackexchange.com

  24. https://discuss.dvc.org/c/blog-discussions/5

  25. https://github.com/features/issues

  26. https://docs.gitlab.com/ee/user/project/issues

  27. https://support.atlassian.com/bitbucket-cloud/docs/understand-bitbucket-issues

  28. https://github.com/features/discussions

  29. https://pypi.org/project/github-dependents-info

  30. https://sourcegraph.com

  31. https://github.com/apache/airflow/discussions/categories/q-a

  32. https://github.com/awslabs/amazon-neptune-tools/issues/38

  33. https://stackoverflow.com/questions/63844663

  34. https://www.googlecloudcommunity.com/gc/AI-ML/What-you-think-about-CHATGPT/m-p/506958

  35. https://platform.openai.com/docs/models/gpt-4

  36. https://github.com/fastai/fastai/issues/3085

  37. https://github.com/MaartenGr/BERTopic

  38. https://github.com/MaartenGr/BERTopic/issues

  39. https://learn.microsoft.com/en-us/answers/

  40. https://community.wandb.ai/t/axis-scales/2892

  41. https://github.com/DagsHub/fds/issues/39

  42. https://stackoverflow.com/questions/56046428

  43. https://stackoverflow.com/questions/72641789

  44. https://stackoverflow.com/questions/65884046

  45. https://github.com/wandb/edu/issues/103

  46. https://stackoverflow.com/questions/75047065

  47. https://github.com/Azure/azureml-examples/issues/242

  48. https://github.com/aws/amazon-sagemaker-examples/issues/698

  49. https://github.com/aws-samples/sagemaker-ssh-helper/issues/28

  50. https://stackoverflow.com/questions/50441181

  51. https://stackoverflow.com/questions/60088889

  52. https://stackoverflow.com/questions/56269391

  53. https://github.com/huggingface/transformers/issues/13111

  54. https://github.com/aws/amazon-sagemaker-examples/issues/670

  55. https://stackoverflow.com/questions/74257398

  56. https://github.com/allegroai/clearml-server/issues/201

  57. https://stackoverflow.com/questions/71505796

  58. https://github.com/Azure/MachineLearningNotebooks/issues/1927

  59. https://stackoverflow.com/questions/64039980

  60. https://community.wandb.ai/t/vega-code/4605

  61. https://neptune.ai/blog/ml-model-packaging

  62. https://aws.amazon.com/blogs/startups/scaling-ai-ml-and-accelerating-ai-development-with-anyscale-and-aws/

  63. https://stackoverflow.com/questions/72408785

  64. https://stackoverflow.com/questions/73435172

  65. https://github.com/Hannibal046/Awesome-LLM

  66. https://github.com/eugeneyan/open-llms

  67. https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models

  68. https://openai.com

  69. https://www.microsoft.com/en-us/edge

  70. https://stackoverflow.com/questions/71255132

  71. https://stackoverflow.com/questions/70335823

  72. https://stackoverflow.com/questions/71398882

  73. https://stackoverflow.com/questions/72106030

  74. https://stackoverflow.com/questions/56024351

  75. https://github.com/getindata/kedro-kubeflow/issues/105

  76. https://github.com/Lightning-AI/lightning/issues/6745

  77. https://stackoverflow.com/questions/57126765

  78. https://stackoverflow.com/questions/73811793

  79. https://github.com/MicrosoftDocs/pipelines-azureml/issues/12

  80. https://stackoverflow.com/questions/72068059

  81. https://stackoverflow.com/questions/58802366

  82. https://stackoverflow.com/questions/67258917

  83. https://stackoverflow.com/questions/72203674

  84. https://stackoverflow.com/questions/74406041

  85. https://community.wandb.ai/t/vega-code/4605

References

  • Agrawal N, Bolosky WJ, Douceur JR, Lorch JR (2007) A five-year study of file-system metadata. ACM Trans Storage (TOS) 3(3):9–es

  • Aguilar Melgar, L., Dao, D., Gan, S., Gürel, N.M., Hollenstein, N., Jiang, J., Karlaš, B., Lemmin, T., Li, T., Li, Y., et al.: Ease. ml: a lifecycle management system for machine learning. In: Proceedings of the Annual Conference on Innovative Data Systems Research (CIDR), 2021. CIDR (2021)

  • Ahmed S, Bagherzadeh M (2018) What do concurrency developers ask about?: a large-scale study using stack overflow. Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement (2018)

  • Alberti M, Pondenkandath V, Würsch M, Ingold R, Liwicki M (2018) Deepdiva: a highly-functional python framework for reproducible experiments. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR). IEEE, pp 423–428

  • Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: A case study. In: 2019 IEEE/ACM 41st International conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 291–300

  • Bagherzadeh M, Khatchadourian R (2019) Going big: a large-scale study on what big data developers ask. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 432–442

  • Bahrampour S, Ramakrishnan N, Schott L, Shah M (2015) Comparative study of deep learning software frameworks. arXiv:1511.06435

  • Baier L, Jöhren F, Seebacher S (2019) Challenges in the deployment and operation of machine learning in practice. In: ECIS, vol. 1

  • Barde BV, Bainwad AM (2017) An overview of topic modeling methods and tools. In: 2017 International conference on intelligent computing and control systems (ICICCS). IEEE, pp 745–750

  • Barrak A, Eghan EE, Adams B (2021) On the co-evolution of ml pipelines and source code-empirical study of dvc projects. In: 2021 IEEE International conference on software analysis, evolution and reengineering (SANER). IEEE, pp 422–433

  • Belguidoum M, Dagnat F (2007) Dependency management in software component deployment. Electron Notes Theor Comput Sci 182:17–32

    Google Scholar 

  • Benítez-Hidalgo A, Barba-González C, García-Nieto J, Gutiérrez-Moncayo P, Paneque M, Nebro AJ, del Mar Roldán-García M, Aldana-Montes JF, Navas-Delgado I (2021) Titan: A knowledge-based platform for big data workflow management. Knowledge-Based Systems 232:107489

    Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc: Ser B (Methodol) 57(1):289–300

    MathSciNet  Google Scholar 

  • Bhattacharjee A, Barve Y, Khare S, Bao S, Gokhale A, Damiano T (2019) Stratum: A serverless framework for the lifecycle management of machine learning-based data analytics tasks. In: 2019 USENIX Conference on Operational Machine Learning (OpML 19), pp 59–61

  • Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E et al (2021) On the opportunities and risks of foundation models. arXiv:2108.07258

  • Borges H, Valente MT (2018) What’s in a github star? understanding repository starring practices in a social coding platform. J Syst Softw 146:112–129

    Google Scholar 

  • Bravo-Rocca G, Liu P, Guitart J, Dholakia A, Ellison D, Falkanger J, Hodak M (2022) Scanflow: A multi-graph framework for machine learning workflow management, supervision, and debugging. Expert Syst Appl 202:117232

    Google Scholar 

  • Campbell JL, Quincy C, Osserman J, Pedersen OK (2013) Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociol Methods Res 42(3):294–320

    MathSciNet  Google Scholar 

  • Chard R, Li Z, Chard K, Ward L, Babuji Y, Woodard A, Tuecke S, Blaiszik B, Franklin MJ, Foster I (2019) Dlhub: Model and data serving for science. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 283–292

  • Chen Z, Cao Y, Liu Y, Wang H, Xie T, Liu X (2020) A comprehensive study on challenges in deploying deep learning based software. In: Proceedings of the 28th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 750–762

  • Chen A, Chow A, Davidson A, DCunha A, Ghodsi A, Hong SA, Konwinski A, Mewald C, Murching S, Nykodym T et al (2020) Developments in mlflow: A system to accelerate the machine learning lifecycle. In: Proceedings of the fourth international workshop on data management for end-to-end machine learning, pp 1–4

  • Chen Y, Fernandes E, Adams B, Hassan AE (2023) On practitioners’ concerns when adopting service mesh frameworks. Empir Softw Eng

  • Cheng L, Li X, Bing L (2023) Is gpt-4 a good data analyst? arXiv:2305.15038

  • Coelho J, Valente MT (2017) Why modern open source projects fail. In: Proceedings of the 2017 11th Joint meeting on foundations of software engineering, pp 186–196

  • Cramér H (1999) Mathematical methods of statistics, vol. 43. Princeton university press

  • Diamantopoulos T, Nastos DN, Symeonidis A (2023) Semantically-enriched jira issue tracking data. In: 2023 IEEE/ACM 20th International conference on mining software repositories (MSR). IEEE, pp 218–222

  • do Prado KS (2020) Kelvins: awesome-mlops: A curated list of awesome mlops tools. https://github.com/kelvins/awesome-mlops

  • Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64

    MathSciNet  Google Scholar 

  • Enck W, Williams L (2022) Top five challenges in software supply chain security: Observations from 30 industry and government organizations. IEEE Secur Privacy 20(2):96–100

    Google Scholar 

  • Esparrachiari S, Reilly T, Rentz A (2018) Tracking and controlling microservice dependencies: Dependency management is a crucial part of system and software design. Queue 16(4):44–65

    Google Scholar 

  • Ferenc R, Viszkok T, Aladics T, Jász J, Hegedűs P (2020) Deep-water framework: The swiss army knife of humans working with machine learning models. SoftwareX 12:100551

    Google Scholar 

  • Françoise J, Caramiaux B, Sanchez T (2021) Marcelle: composing interactive machine learning workflows and interfaces. In: The 34th Annual ACM symposium on user interface software and technology, pp 39–53

  • Garcia R, Sreekanti V, Yadwadkar N, Crankshaw D, Gonzalez JE, Hellerstein JM (2018) Context: The missing piece in the machine learning lifecycle. In: KDD CMI Workshop, vol. 114, pp 1–4

  • Gao C (2022) Tensorchord: awesome-llmops: An awesome curated list of best llmops tools for developers. https://github.com/tensorchord/Awesome-LLMOps

  • Gharibi G, Walunj V, Alanazi R, Rella S, Lee Y (2019) Automated management of deep learning experiments. In: Proceedings of the 3rd International workshop on data management for end-to-end machine learning, pp 1–4

  • Gilardi F, Alizadeh M, Kubli M (2023) Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv:2303.15056

  • Giray G (2021) A software engineering perspective on engineering machine learning systems: State of the art and challenges. J Syst Softw 180:111031

    Google Scholar 

  • Goniwada SR, Goniwada SR (2022) Observability. Cloud native architecture and design: a handbook for modern day architecture and design with enterprise-grade examples pp 661–676

  • Gou J, Yu B, Maybank SJ, Tao D (2021) Knowledge distillation: A survey. Int J Comput Vision 129:1789–1819

    Google Scholar 

  • Groeneveld D, Beltagy I, Walsh P, Bhagia A, Kinney R, Tafjord O, Jha AH, Ivison H, Magnusson I, Wang Y et al (2024) Olmo: Accelerating the science of language models. arXiv:2402.00838

  • Grootendorst M (2022) Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv:2203.05794

  • Grubb P, Takang AA (2003) Software maintenance: concepts and practice. World Scientific

  • Gu H, He H, Zhou M (2023) Self-admitted library migrations in java, javascript, and python packaging ecosystems: A comparative study. In: 2023 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 627–638

  • Hartley M, Olsson TS (2020) dtoolai: Reproducibility for deep learning. Patterns 1(5)

  • Hastie T, Tibshirani R, Friedman JH, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer

  • Hewage N, Meedeniya D (2022) Machine learning operations: A survey on mlops tool support. arXiv:2202.10169

  • Hummer W, Muthusamy V, Rausch T, Dube P, El Maghraoui K, Murthi A, Oum P (2019) Modelops: Cloud-based lifecycle management for reliable and trusted ai. In: 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE, pp 113–120

  • Idowu S, Strüber D, Berger T (2022) Asset management in machine learning: State-of-research and state-of-practice. ACM Comput Surv. https://doi.org/10.1145/3543847. Just Accepted

  • Idowu S, Strüber D, Berger T (2022) Emmm: A unified meta-model for tracking machine learning experiments. In: 2022 48th Euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 48–55

  • Isah H, Abughofa T, Mahfuz S, Ajerla D, Zulkernine F, Khan S (2019) A survey of distributed data stream processing frameworks. IEEE Access 7:154300–154316

    Google Scholar 

  • Izquierdo JLC, Cosentino V, Cabot J (2017) An empirical study on the maturity of the eclipse modeling ecosystem. In: 2017 ACM/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (MODELS). IEEE, pp 292–302

  • Jalali S, Wohlin C (2012) Systematic literature studies: database searches vs. backward snowballing. In: Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement, pp 29–38

  • Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Chaplot DS, Casas Ddl, Bressand F, Lengyel G, Lample G, Saulnier L et al (2023) Mistral 7b. arXiv:2310.06825

  • Jiang W, Synovic N, Hyatt M, Schorlemmer TR, Sethi R, Lu YH, Thiruvathukal GK, Davis JC (2023) An empirical study of pre-trained model reuse in the hugging face deep learning model registry. arXiv:2303.02552

  • Khondhu J, Capiluppi A, Stol KJ (2013) Is it all lost? a study of inactive open source projects. In: Open source software: quality verification: 9th IFIP WG 2.13 International conference, OSS 2013, Koper-Capodistria, Slovenia, June 25-28, 2013. Proceedings 9. Springer, pp 61–79

  • Kitchenham BA, Travassos GH, Von Mayrhauser A, Niessink F, Schneidewind NF, Singer J, Takada S, Vehvilainen R, Yang H (1999) Towards an ontology of software maintenance. J Softw Maintenance: Res Pract 11(6):365–389

    Google Scholar 

  • Klaise J, Van Looveren A, Cox C, Vacanti G, Coca A (2020) Monitoring and explainability of models in production. arXiv:2007.06299

  • Kreutz D, Ramos FM, Verissimo PE, Rothenberg CE, Azodolmolky S, Uhlig S (2014) Software-defined networking: A comprehensive survey. Proc of the IEEE 103(1):14–76

    Google Scholar 

  • Kumar A, Boehm M, Yang J (2017) Data management in machine learning: Challenges, techniques, and systems. In: Proceedings of the 2017 ACM International conference on management of data, pp 1717–1722

  • Lapan M (2018) Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients. Packt Publishing Ltd, AlphaGo Zero and more, TRPO

    Google Scholar 

  • Le VD (2023) Veml: An end-to-end machine learning lifecycle for large-scale and high-dimensional data. arXiv:2304.13037

  • Liu A, Han X, Wang Y, Tsvetkov Y, Choi Y, Smith NA (2024) Tuning language models by proxy. arXiv:2401.08565

  • Liu Y, Iter D, Xu Y, Wang S, Xu R, Zhu C (2023) Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv:2303.16634

  • Loeliger J, McCullough M (2012) Version Control with Git: Powerful tools and techniques for collaborative software development. " O’Reilly Media, Inc."

  • Lu L, Arpaci-Dusseau AC, Arpaci-Dusseau RH, Lu S (2013) A study of linux file system evolution. In: 11th USENIX Conference on file and storage technologies (FAST 13), pp 31–44

  • Manvi SS, Shyam GK (2014) Resource management for infrastructure as a service (iaas) in cloud computing: A survey. J Netw Comput Appl 41:424–440

    Google Scholar 

  • McHugh ML (2012) Interrater reliability: the kappa statistic. Biochem Med 22(3):276–282

    MathSciNet  Google Scholar 

  • McInnes L, Healy J, Astels S (2017) hdbscan: Hierarchical density based clustering. J Open Source Softw 2(11):205

    Google Scholar 

  • McKinney W et al (2011) pandas: a foundational python library for data analysis and statistics. Python high Perform Sci Comput 14(9):1–9

    Google Scholar 

  • Melin PD (2023) Tackling version management and reproducibility in mlops

  • Mens T, Goeminne M, Raja U, Serebrenik A (2014) Survivability of software projects in gnome–a replication study. In: 7th International seminar series on advanced techniques & tools for software evolution (SATToSE), pp 79–82

  • Miao H, Chavan A, Deshpande A (2017) Provdb: Lifecycle management of collaborative analysis workflows. In: Proceedings of the 2nd workshop on human-in-the-loop data analytics, pp 1–6

  • Miao H, Li A, Davis LS, Deshpande A (2017) Modelhub: Deep learning lifecycle management. In: 2017 IEEE 33rd International conference on data engineering (ICDE). IEEE, pp 1393–1394

  • Miao H, Li A, Davis LS, Deshpande A (2017) Towards unified data and lifecycle management for deep learning. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, pp 571–582

  • Miotto R, Wang F, Wang S, Jiang X, Dudley JT (2018) Deep learning for healthcare: review, opportunities and challenges. Briefings Bioinf 19(6):1236–1246

    Google Scholar 

  • Moreno M, Lourenço V, Fiorini SR, Costa P, Brandão R, Civitarese D, Cerqueira R (2020) Managing machine learning workflow components. Int J Sem Comput 14(02):295–309

    Google Scholar 

  • Moreschi S, Recupito G, Lenarduzzi V, Palomba F, Hastbacka D, Taibi D (2023) Toward end-to-end mlops tools map: A preliminary study based on a multivocal literature review. arXiv:2304.03254

  • Munappy AR, Bosch J, Olsson HH, Arpteg A, Brinne B (2022) Data management for production quality deep learning models: Challenges and solutions. J Syst Softw 191:111359

  • Mustafa S, Nazir B, Hayat A, Madani SA et al (2015) Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Electr Eng 47:186–203

    Google Scholar 

  • Nagy AM, Simon V (2018) Survey on traffic prediction in smart cities. Pervasive Mobile Comput 50:148–163

    Google Scholar 

  • Namaki MH, Floratou A, Psallidas F, Krishnan S, Agrawal A, Wu Y (2020) Vamsa: Tracking provenance in data science scripts. arXiv:2001.01861

  • Nguyen G, Dlugolinsky S, Bobák M, Tran V, López García Á, Heredia I, Malík P, Hluchỳ L (2019) Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev 52:77–124

    Google Scholar 

  • Openja M, Adams B, Khomh F (2020) Analysis of modern release engineering topics: A large-scale study using stackoverflow. In: Proceedings of the 36th International conference on software maintenance and evolution (ICSME), pp 104–114

  • Paleyes A, Urma RG, Lawrence ND (2022) Challenges in deploying machine learning: a survey of case studies. ACM Comput Surv 55(6):1–29

    Google Scholar 

  • Parra E, Alahmadi M, Ellis A, Haiduc S (2022) A comparative study and analysis of developer communications on slack and gitter. Empir Softw Eng 27(2):40

    Google Scholar 

  • Pavao A, Guyon I, Letournel AC, Baró X, Escalante H, Escalera S, Thomas T, Xu Z (2022) Codalab competitions: An open source platform to organize scientific challenges. Ph.D. thesis, Université Paris-Saclay, FRA. (2022)

  • Pearson K (1900) X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philos Mag J Sci 50(302):157–175

  • Peili Y, Xuezhen Y, Jian Y, Lingfeng Y, Hui Z, Jimin L (2018) Deep learning model management for coronary heart disease early warning research. In: 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA). IEEE, pp 552–557

  • Polyzotis N, Roy S, Whang SE, Zinkevich M (2018) Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Record 47(2):17–28

    Google Scholar 

  • Recupito G, Pecorelli F, Catolino G, Moreschini S, Di Nucci D, Palomba F, Tamburri DA (2022) A multivocal literature review of mlops tools and features. In: 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, pp 84–91

  • Rigby PC, Barr ET, Bird C, German DM, Devanbu P (2009) Collaboration and governance with distributed version control. ACM Trans Software Engineering and Methodology, Submission number TOSEM-2009-0087 p 33

  • Rochkind MJ (1975) The source code control system. IEEE Trans Softw Eng 4:364–370

    Google Scholar 

  • Rosen C, Shihab E (2016) What are mobile developers asking about? a large scale study using stack overflow. Empir Softw Eng 21:1192–1223

    Google Scholar 

  • Ruf P, Madan M, Reich C, Ould-Abdeslam D (2021) Demystifying mlops and presenting a recipe for the selection of open-source tools. Appl Sci 11(19):8861

    Google Scholar 

  • Sallou J, Durieux T, Panichella A (2024) Breaking the silence: the threats of using llms in software engineering. In: ACM/IEEE 46th International conference on software engineering. ACM/IEEE

  • Saucedo A (2018) EthicalML: awesome-production-machine-learning: A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning. https://github.com/EthicalML/awesome-production-machine-learning

  • Schelter S, Biessmann F, Januschowski T, Salinas D, Seufert S, Szarvas G (2015) On challenges in machine learning model management

  • Schelter S, Böse JH, Kirschnick J, Klein T, Seufert S (2018) Declarative metadata management: A missing piece in end-to-end machine learning

  • Schick T, Schütze H (2020) It’s not just size that matters: Small language models are also few-shot learners. arXiv:2009.07118

  • Schlegel M, Sattler KU (2023) Management of machine learning lifecycle artifacts: A survey. ACM SIGMOD Record 51(4):18–35

    Google Scholar 

  • Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, Chaudhary V, Young M, Crespo JF, Dennison D (2015) Hidden technical debt in machine learning systems. Advances in neural information processing systems 28

  • Soomro ZA, Shah MH, Ahmed J (2016) Information security management needs more holistic approach: A literature review. Int J Inf Manag 36(2):215–225

    Google Scholar 

  • Sorokin A, Forsyth D (2008) Utility data annotation with amazon mechanical turk. In: 2008 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, pp 1–8

  • Squire M (2015) "should we move to stack overflow?" measuring the utility of social media for developer support. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2. IEEE, pp 219–228

  • Storey JD (2002) A direct approach to false discovery rates. J Royal Stat Soc Ser B: Stat Methodol 64(3):479–498

    MathSciNet  Google Scholar 

  • Sun C, Azari N, Turakhia C (2020) Gallery: A machine learning model management system at uber. In: EDBT, vol. 20, pp 474–485

  • Sung N, Kim M, Jo H, Yang Y, Kim J, Lausen L, Kim Y, Lee G, Kwak D, Ha JW et al (2017) Nsml: A machine learning platform that enables you to focus on your models. arXiv:1712.05902

  • Syed S, Spruit M (2017) Full-text or abstract? examining topic coherence scores using latent dirichlet allocation. In: 2017 IEEE International conference on data science and advanced analytics (DSAA). IEEE, pp 165–174

  • Symeonidis G, Nerantzis E, Kazakis A, Papakostas GA (2022) Mlops-definitions, tools and challenges. In: 2022 IEEE 12th Annual computing and communication workshop and conference (CCWC). IEEE, pp 0453–0460

  • Tao L, Cazan AP, Ibraimoski S, Moran S (2023) Code librarian: A software package recommendation system. In: 2023 IEEE/ACM 45th International conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 196–198

  • Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava P, Bhosale S et al (2023) Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288

  • Treude C, Barzilay O, Storey MA (2011) How do programmers ask and answer questions on the web?(nier track). In: Proceedings of the 33rd international conference on software engineering, pp 804–807

  • Tsay J, Mummert T, Bobroff N, Braz A, Westerink P, Hirzel M (2018) Runway: machine learning model experiment management tool. In: Conference on systems and machine learning (sysML)

  • Vadlamani SL, Baysal O (2020) Studying software developer expertise and contributions in stack overflow and github. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 312–323

  • Vartak M, Madden S (2018) Modeldb: Opportunities and challenges in managing machine learning models. IEEE Data Eng Bull 41(4):16–25

    Google Scholar 

  • Vasilescu B, Filkov V, Serebrenik A (2013) Stackoverflow and github: Associations between software development and crowdsourced knowledge. In: 2013 International conference on social computing. IEEE, pp 188–195

  • Venkatesh PK, Wang S, Zhang F, Zou Y, Hassan AE (2016) What do client developers concern when using web apis? an empirical study on developer forums and stack overflow. In: 2016 IEEE International Conference on Web Services (ICWS). IEEE, pp 131–138

  • Wang Z, Liu K, Li J, Zhu Y, Zhang Y (2019) Various frameworks and libraries of machine learning and deep learning: a survey. Archives of computational methods in engineering pp 1–24

  • Werlinger R, Hawkey K, Beznosov K (2009) An integrated view of human, organizational, and technological challenges of it security management. Inf Manag Comput Secur 17(1):4–19

    Google Scholar 

  • Wood JR, Wood LE (2008) Card sorting: current practices and beyond. J Usability Studies 4(1):1–6

    Google Scholar 

  • Wozniak JM, Jain R, Balaprakash P, Ozik J, Collier NT, Bauer J, Xia F, Brettin T, Stevens R, Mohd-Yusof J et al (2018) Candle/supervisor: A workflow framework for machine learning applied to cancer research. BMC Bioinf 19(18):59–69

    Google Scholar 

  • Xia W, Wen Y, Foh CH, Niyato D, Xie H (2014) A survey on software-defined networking. IEEE Commun Surv Tutor 17(1):27–51

    Google Scholar 

  • Xin D, Miao H, Parameswaran A, Polyzotis N (2021) Production machine learning pipelines: Empirical analysis and optimization opportunities. In: Proceedings of the 2021 international conference on management of data, pp 2639–2652

  • Xiu M, Jiang ZMJ, Adams B (2020) An exploratory study of machine learning model stores. IEEE Software 38(1):114–122

    Google Scholar 

  • Yang X, Lo D, Xia X, Wan Z, Sun J (2016) What security questions do developers ask? a large-scale study of stack overflow posts. J Comput Sci Technol 31:910–924

    Google Scholar 

  • Yang C, Wang W, Zhang Y, Zhang Z, Shen L, Li Y, See J (2021) Mlife: A lite framework for machine learning lifecycle initialization. Mach Learn 110:2993–3013

    MathSciNet  Google Scholar 

  • Yao Y, Duan J, Xu K, Cai Y, Sun E, Zhang Y (2023) A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. arXiv:2312.02003

  • Zaharia M, Chen A, Davidson A, Ghodsi A, Hong SA, Konwinski A, Murching S, Nykodym T, Ogilvie P, Parkhe M et al (2018) Accelerating the machine learning lifecycle with mlflow. IEEE Data Eng Bull 41(4):39–45

    Google Scholar 

  • Zhang S, Dong L, Li X, Zhang S, Sun X, Wang S, Li J, Hu R, Zhang T, Wu F et al (2023) Instruction tuning for large language models: A survey. arXiv:2308.10792

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhimin Zhao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Massimiliano Di Penta.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Definition and illustration of solution macro-topics not discussed in Section 5.1

\(\hat{R}_{01}\) Code Development

 

Definition: Same as \(\hat{C}_{01}\). It integrates argument management (\(R_{12}\)), code modification (\(R_{13}\)), wait time management (\(R_{27}\)), command line usage (\(R_{28}\)), API integration (\(R_{47}\)), parameter update (\(R_{56}\)), character removal (\(R_{57}\)), troubleshooting guidance (\(R_{58}\)), function modification (\(R_{59}\)), syntax update (\(R_{61}\)), exception handling (\(R_{63}\)) and parameter removal (\(R_{84}\)) to facilitate a controlled, efficient, and error-resistant MLOps environment.

 

Example: The following exampleFootnote 70 suggests checking the “Action” field when “Use Action Name” is selected in API methods.

 

\(E_{01}\): Accepted Answer: Check in all your API methods that you have not specified “Use Action Name” for any integration request and then leave the “Action” field blank. [TEXT]

 

\(\hat{R}_{02}\) Code Management

 

Definition: Same as \(\hat{C}_{02}\). It encompasses the creation or updating of Git repositories (\(R_{43}\)) for optimal version control and collaboration.

 

Example: The following exampleFootnote 71 illustrates adding the relative path to write to the output folder.

 

\(E_{02}\): Accepted Answer: [TEXT] Here is an example of an operation that reads from the relative path where your code exists: [CODE] You can then join the relative path of your git folder before your output. [TEXT]

 

\(\hat{R}_{03}\) Computation Management

 

Definition: Same as \(\hat{C}_{03}\). It includes regulating resource usage through limit adjustments (\(R_{19}\)), facilitating functionalities through service provisioning (\(R_{30}\)), overseeing the creation and handling of compute instances or clusters for task execution (\(R_{35}\)), managing event-driven programming through lambda function management (\(R_{53}\)), and improving performance by increasing resource capacities (\(R_{60}\)).

 

Example: The following exampleFootnote 72 suggests increasing GPU memory, decreasing batch size, and changing to a smaller model.

 

\(E_{03}\): Accepted Answer: [TEXT] Things that you can try: Provision an instance with more GPU memory; Decrease batch size; Use a different (smaller) model.

 

\(\hat{R}_{04}\) Data Development

 

Definition: Same as \(\hat{C}_{04}\). It covers column manipulation (\(R_{31}\)), feature filtering (\(R_{33}\)), and data transformation (\(R_{54}\)) to enhance and refine data for an effective ML pipeline.

 

Example: The following exampleFootnote 73 illustrates the usage of a feature store.

 

\(E_{04}\): Accepted Answer: [TEXT] Once a feature store is created, you will need to create an entity and then create a feature that has the labels parameter as shown in the below sample Python code. [CODE]

 

\(\hat{R}_{05}\) Data Management

 

Definition: Same as \(\hat{C}_{05}\). It encompasses the conversion of data and datatypes for compatibility and processing (\(R_{40}\), \(R_{70}\)), the creation of datasets for model training and validation (\(R_{42}\)), facilitating seamless data import/export for access and sharing (\(R_{75}\)), and the manipulation of buckets for organized storage in cloud services (\(R_{77}\)).

 

Example: The following exampleFootnote 74 suggests examination of the output location in Amazon S3.

 

\(E_{05}\): Accepted Answer: [TEXT] SageMaker places the model artifacts in a bucket that you own, check the S3 output location in the AWS SageMaker console. [TEXT]

 

\(\hat{R}_{06}\) Environment Management

 

Definition: Same as \(\hat{C}_{06}\). It encompasses package upgrades (\(R_{01}\)), installation (\(R_{05}\)), version management (\(R_{07}\)), SDK upgrades (\(R_{15}\)), container customization (\(R_{21}\)), Docker management (\(R_{22}\)), package additions (\(R_{23}\)), creation of environments (\(R_{25}\)), management of environment variables (\(R_{34}\)), SDK usage (\(R_{37}\)), package downgrades (\(R_{41}\)), reinstallations (\(R_{44}\)), removals (\(R_{62}\)), notebook usage (\(R_{66}\)), Python version management (\(R_{67}\)), package imports (\(R_{69}\)), workspace creation (\(R_{72}\)), region support (\(R_{73}\)), kernel restarts (\(R_{76}\)), Docker updates (\(R_{78}\)) and notebook instance management (\(R_{85}\)).

 

Example: The following exampleFootnote 75 suggests the prohibition of circular dependency.

 

\(E_{06}\): Merge Request: Fixes #105 by not allowing circullar dependency on mlflow.

 

\(\hat{R}_{07}\) Experiment Management

 

Definition: Same as $$\hat{C}_{07}$$ C ^ 07 . It encapsulates the concepts of specifying run settings ( $$R_{11}$$ R 11 ), creating or updating ML experiments ( $$R_{39}$$ R 39 ), providing templates for machine learning tasks ( $$R_{71}$$ R 71 ) and tailoring sessions for task execution ( $$R_{83}$$ R 83 ). Example: The following exampleFootnote 76 suggests adding information from the experiment run.

 

\(E_{07}\): Merge Request: Fix #6745: adds additional information about the run, as in the native API. [TEXT]

 

\(\hat{R}_{08}\) File Management

 

Definition: Same as \(\hat{C}_{08}\). It encompasses the processes of storage mounting(\(R_{16}\)), directory management(\(R_{20}\)), file deletion(\(R_{29}\)), file download(\(R_{36}\)), filepath update(\(R_{49}\)), input management(\(R_{55}\)), filepath modification(\(R_{64}\)), file load(\(R_{65}\)), tracking configuration(\(R_{68}\)), and documentation update(\(R_{86}\)).

 

Example: The following exampleFootnote 77 suggests copying files from Amazon S3 to the local drive.

 

\(E_{08}\): Accepted Answer: [TEXT] The simplest option is to copy the files from S3 to the local drive (EBS or EFS) of the notebook instance: [CODE]

 

\(\hat{R}_{09}\) Model Deployment

 

Definition: Same as \(\hat{C}_{09}\). It involves endpoint invocation (\(R_{24}\)), deployment pipeline creation (\(R_{26}\)), model prediction (\(R_{38}\)), model deployment (\(R_{79}\)) and implementation of the inference pipeline (\(R_{80}\)) to efficiently deploy and serve machine learning models in the production environment.

 

Example: The following exampleFootnote 78 suggests the usage of undeploy_all function.

 

\(E_{09}\): Accepted Answer: You can undeploy all the models from an endpoint by calling the method undeploy_all() [TEXT]

 

\(\hat{R}_{10}\) Model Development

 

Definition: Same as \(\hat{C}_{10}\). It encompasses the practice of distributed training (\(R_{03}\)), which focuses on the implementation and configuration of parallel or continuous training processes for model creation.

 

Example: The following exampleFootnote 79 suggests creating a custom Docker image and uploading it to Azure Container Registry.

 

\(E_{12}\): Solution Comment: Got this working by creating a custom Docker image and putting it to the ACR tied to Azure ML workspace. [TEXT]

 

\(\hat{R}_{11}\) Model Management

 

Definition: Same as \(\hat{C}_{11}\). It integrates the processes of model creation (\(R_{10}\)), registration (\(R_{32}\)), and file handling (\(R_{52}\)) to streamline the lifecycle of machine learning models.

 

Example: The following exampleFootnote 80 suggests replacing load_model with pickle.

 

\(E_{11}\): Accepted Answer: [TEXT] In particular, this answer using load_model instead of pickle seemed to work well for me: [CODE]

 

\(\hat{R}_{12}\) Network Management

 

Definition: Same as \(\hat{C}_{12}\). It encompasses both the configuration or alteration of network settings to improve system performance (\(R_{09}\)) and the process of updating or configuring hyperlinks for precise navigation (\(R_{51}\)).

 

Example: The following exampleFootnote 81 suggests creating an API gateway to share model as an endpoint.

 

\(E_{12}\): Accepted Answer: To share your model as an endpoint, you should use lambda and API gateway to create your API. [TEXT]

 

\(\hat{R}_{13}\) Observability Management

 

Definition: Same as \(\hat{C}_{13}\). It encompasses the setup of logging systems(\(R_{06}\)), the updating or scrutiny of performance metrics(\(R_{50}\)), and the enhancement of log functions or levels for better debugging(\(R_{74}\)).

 

Example: The following exampleFootnote 82 suggests checking the metrics tab of the current step.

 

\(E_{13}\): Accepted Answer: [TEXT] In Studio, if you go to the step’s Metrics tab, you will be able to see a chart/table of execution progress, including remaining items, remaining mini batches, failed items, etc. [URL]

 

\(\hat{R}_{14}\) Pipeline Management

 

Definition: Same as \(\hat{C}_{14}\). It encompasses the cohesive administration and orchestration of job processing (\(R_{14}\)), where tasks are executed potentially in parallel or on a schedule, combined with pipeline configuration (\(R_{18}\)) that involves the creation, updating or modification of pipelines for efficient data and model workflow, and lifecycle configuration (\(R_{45}\)), which denotes the management of application states through the implementation or modification of lifecycle scripts.

 

Example: The following exampleFootnote 83 clarifies the usage of the pipeline construct.

 

\(E_{14}\): Accepted Answer: CDK L1 Constructs correspond 1:1 to a CloudFormation resource of the same name. The construct props match the resource properties. Therefore, the go-to source is the CloudFormation documentation.

 

\(\hat{R}_{15}\) Security Management

 

Definition: Same as \(\hat{C}_{15}\). It integrates the processes of establishing, allocating, or altering access permissions (\(R_{04}\)), updating authentication credentials (\(R_{17}\)), and registering or recreating user accounts (\(R_{81}\)) to improve access control.

 

Example: The following exampleFootnote 84 suggests ungrading the workspace to migrate from V1 to V2.

 

\(E_{15}\): Accepted Answer: [TEXT] To migrate from Azure Machine Learning V1 to V2, you need to upgrade the az ml workspace share commands to equivalent az role assignment create commands. [TEXT]

 

\(\hat{R}_{16}\) User Interface Management

 

Definition: Same as \(\hat{C}_{16}\). It includes the creation and modification of data visualizations (\(R_{48}\)) for improved data analysis and interpretation.

 

Example: The following exampleFootnote 85 clarifies the support for the Vega visualization language.

 

\(E_{16}\): Accepted Answer: [TEXT] currently Vega only powers custom charts and the underlying code for other panels are created in JavaScript, unfortunately.

 

Appendix B

Table 6 Retrieved literature related to ML asset management tools
Table 7 Information of ML asset management tools
Table 8 Stack Overflow tag list for each curated tool
Table 9 Information of tool-specific forums
Table 10 Usage pattern of curated tools
Table 11 Post title keywords for each curated tool
Table 12 Parameters for GPT-4 prompting
Table 13 Hyperparameter search space for topic modeling of post inquiries
Table 14 Challenge topic information
Table 15 Solution topic information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Z., Chen, Y., Bangash, A. et al. An empirical study of challenges in machine learning asset management. Empir Software Eng 29, 98 (2024). https://doi.org/10.1007/s10664-024-10474-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-024-10474-4

Keywords

Navigation