[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3641525.3663648acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-repConference Proceedingsconference-collections
research-article

MLXP: A framework for conducting replicable experiments in Python

Published: 11 July 2024 Publication History

Abstract

Replicability in machine learning (ML) research is increasingly concerning due to the utilization of complex non-deterministic algorithms and the dependence on numerous hyper-parameter choices, such as model architecture and training datasets. Ensuring reproducible and replicable results is crucial for advancing the field, yet often requires significant technical effort to conduct systematic and well-organized experiments that yield robust conclusions. Several tools have been developed to facilitate experiment management and enhance reproducibility; however, they often introduce complexity that hinders adoption within the research community, despite being well-handled in industrial settings. To address the challenge of low adoption, we propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python, available at https://github.com/inria-thoth/mlxp. MLXP streamlines the experimental process with minimal practitioner overhead while ensuring a high level of reproducibility.

References

[1]
2021 (accessed August, 2021). Notes from the AI Frontier Insights from Hundreds of Use Cases. https://www.mckinsey.com/featured-insights/artificial-intelligence/ notes-from-the-ai-frontier-applications-and-value-of-deep-learning
[2]
2021 (accessed August, 2021). The Machine Learning Reproducibility Checklist. https://www.cs.mcgill.ca/ jpineau/ReproducibilityChecklist.pdf
[3]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald C. Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: a case study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE (SEIP) 2019, Montreal, QC, Canada, May 25-31, 2019, Helen Sharp and Mike Whalen (Eds.). IEEE / ACM, 291–300.
[4]
Michael Arbel, Romain Menegaux, and Pierre Wolinski. 2023. Rethinking Gauss-Newton for learning over-parameterized models. Advances in neural information processing systems (2023).
[5]
Amine Barrak, Ellis E. Eghan, and Bram Adams. 2021. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects. In 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021, Honolulu, HI, USA, March 9-12, 2021. IEEE, 422–433.
[6]
Lukas Biewald. 2020. Experiment Tracking with Weights and Biases. https://www.wandb.com/ Software available from wandb.com.
[7]
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
[8]
Boyuan Chen, Mingzhi Wen, Yong Shi, Dayi Lin, Gopi Krishnan Rajbahadur, and Zhen Ming Jiang. 2022. Towards training reproducible deep learning models. In Proceedings of the 44th International Conference on Software Engineering. 2202–2214.
[9]
Rudolf Ferenc, Tamás Viszkok, Tamás Aladics, Judit Jász, and Péter Hegedűs. 2020. Deep-water framework: The Swiss army knife of humans working with machine learning models. SoftwareX 12 (2020), 100551.
[10]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna M. Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for Datasets. CoRR abs/1803.09010 (2018). arxiv:1803.09010http://arxiv.org/abs/1803.09010
[11]
Gharib Gharibi, Vijay Walunj, Rakan Alanazi, Sirisha Rella, and Yugyung Lee. 2019. Automated management of deep learning experiments. In Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning. 1–4.
[12]
Gharib Gharibi, Vijay Walunj, Sirisha Rella, and Yugyung Lee. 2019. Modelkb: towards automated management of the modeling lifecycle in deep learning. In 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). IEEE, 28–34.
[13]
Sorin Mihai Grigorescu, Bogdan Trasnea, Tiberiu T. Cocias, and Gigel Macesanu. 2020. A survey of deep learning techniques for autonomous driving. J. Field Robotics 37, 3 (2020), 362–386.
[14]
Charles Hill, Rachel Bellamy, Thomas Erickson, and Margaret Burnett. 2016. Trials and tribulations of developers of intelligent systems: A field study. In 2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 162–170.
[15]
Matthew Hutson. 2018. Artificial intelligence faces reproducibility crisis. Science (New York, N.Y.) 359 (02 2018), 725–726. https://doi.org/10.1126/science.359.6377.725
[16]
Samuel Idowu, Daniel Strüber, and Thorsten Berger. 2021. Asset Management in Machine Learning: A Survey. In 43rd IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, ICSE (SEIP) 2021, Madrid, Spain, May 25-28, 2021. IEEE, 51–60.
[17]
Samuel Idowu, Daniel Strüber, and Thorsten Berger. 2022. Asset management in machine learning: State-of-research and state-of-practice. Comput. Surveys 55, 7 (2022), 1–35.
[18]
Richard Isdahl and Odd Erik Gundersen. 2019. Out-of-the-box reproducibility: A survey of machine learning platforms. In 2019 15th international conference on eScience (eScience). IEEE, 86–95.
[19]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, danah boyd and Jamie H. Morgenstern (Eds.). ACM, 220–229.
[20]
Marçal Mora-Cantallops, Salvador Sánchez-Alonso, Elena García-Barriocanal, and Miguel-Angel Sicilia. 2021. Traceability for trustworthy ai: A review of models and tools. Big Data and Cognitive Computing 5, 2 (2021), 20.
[21]
Alexandru A Ormenisan, Mahmoud Ismail, Seif Haridi, and Jim Dowling. 2020. Implicit provenance for machine learning artifacts. Proceedings of MLSys 20 (2020).
[22]
The pandas development team. 2020. pandas-dev/pandas: Pandas. https://doi.org/10.5281/zenodo.3509134
[23]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. _eprint: 1912.01703.
[24]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research 12, Oct (2011), 2825–2830.
[25]
Behnood Rasti, Alexandre Zouaoui, Julien Mairal, and Jocelyn Chanussot. 2023. Image Processing and Machine Learning for Hyperspectral Unmixing: An Overview and the HySUPP Python Package. arXiv preprint arXiv:2308.09375 (2023).
[26]
Alex Serban, Koen van der Blom, Holger Hoos, and Joost Visser. 2020. Adoption and effects of software engineering best practices in machine learning. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12.
[27]
Koustuv Sinha, Maurits Bleeker, Samarth Bhargav, Jessica Zosa Forde, Sharath Chandra Raparthy, Jesse Dodge, Joelle Pineau, and Robert Stojnic. 2023. ML Reproducibility Challenge 2022. ReScience C 9, 2 (July 2023).
[28]
Shagun Sodhani, Amy Zhang, and Joelle Pineau. 2021. Multi-task reinforcement learning with context-based representations. In International Conference on Machine Learning. PMLR, 9767–9779.
[29]
Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. 2022. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2149–2159.
[30]
Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Daniel MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert. 2022. PDEBench: An extensive benchmark for scientific machine learning. Advances in Neural Information Processing Systems 35 (2022), 1596–1611.
[31]
Rachael Tatman, Jake VanderPlas, and Sohier Dane. 2018. A practical taxonomy of reproducibility for machine learning research.
[32]
Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023).
[33]
Jason Tsay, Todd Mummert, Norman Bobroff, Alan Braz, Peter Westerink, and Martin Hirzel. 2018. Runway: machine learning model experiment management tool. In Conference on systems and machine learning (sysML).
[34]
Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel Madden, and Matei Zaharia. 2016. ModelDB: a system for machine learning model management. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. 1–3.
[35]
Thomas Weißgerber and Michael Granitzer. 2019. Mapping platforms into a new open science model for machine learning. it-Information Technology 61, 4 (2019), 197–208.
[36]
Omry Yadan. 2019. Hydra - A framework for elegantly configuring complex applications. https://github.com/facebookresearch/hydra
[37]
Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, 2018. Accelerating the machine learning lifecycle with MLflow.IEEE Data Eng. Bull. 41, 4 (2018), 39–45.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ACM REP '24: Proceedings of the 2nd ACM Conference on Reproducibility and Replicability
June 2024
151 pages
ISBN:9798400705304
DOI:10.1145/3641525
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Machine learning
  2. Replicability
  3. Reproducibility

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ACM REP '24
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 25
    Total Downloads
  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)5
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media