More Web Proxy on the site http://driver.im/

research-article

MLXP: A framework for conducting replicable experiments in Python

Authors:

Alexander ZouaouiAuthors Info & Claims

ACM REP '24: Proceedings of the 2nd ACM Conference on Reproducibility and Replicability

Pages 134 - 144

https://doi.org/10.1145/3641525.3663648

Published: 11 July 2024 Publication History

Abstract

Replicability in machine learning (ML) research is increasingly concerning due to the utilization of complex non-deterministic algorithms and the dependence on numerous hyper-parameter choices, such as model architecture and training datasets. Ensuring reproducible and replicable results is crucial for advancing the field, yet often requires significant technical effort to conduct systematic and well-organized experiments that yield robust conclusions. Several tools have been developed to facilitate experiment management and enhance reproducibility; however, they often introduce complexity that hinders adoption within the research community, despite being well-handled in industrial settings. To address the challenge of low adoption, we propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python, available at https://github.com/inria-thoth/mlxp. MLXP streamlines the experimental process with minimal practitioner overhead while ensuring a high level of reproducibility.

References

[1]

2021 (accessed August, 2021). Notes from the AI Frontier Insights from Hundreds of Use Cases. https://www.mckinsey.com/featured-insights/artificial-intelligence/ notes-from-the-ai-frontier-applications-and-value-of-deep-learning

[2]

2021 (accessed August, 2021). The Machine Learning Reproducibility Checklist. https://www.cs.mcgill.ca/ jpineau/ReproducibilityChecklist.pdf

[3]

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald C. Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: a case study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE (SEIP) 2019, Montreal, QC, Canada, May 25-31, 2019, Helen Sharp and Mike Whalen (Eds.). IEEE / ACM, 291–300.

Digital Library

[4]

Michael Arbel, Romain Menegaux, and Pierre Wolinski. 2023. Rethinking Gauss-Newton for learning over-parameterized models. Advances in neural information processing systems (2023).

[5]

Amine Barrak, Ellis E. Eghan, and Bram Adams. 2021. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects. In 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021, Honolulu, HI, USA, March 9-12, 2021. IEEE, 422–433.

[6]

Lukas Biewald. 2020. Experiment Tracking with Weights and Biases. https://www.wandb.com/ Software available from wandb.com.

[7]

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax

[8]

Boyuan Chen, Mingzhi Wen, Yong Shi, Dayi Lin, Gopi Krishnan Rajbahadur, and Zhen Ming Jiang. 2022. Towards training reproducible deep learning models. In Proceedings of the 44th International Conference on Software Engineering. 2202–2214.

Digital Library

[9]

Rudolf Ferenc, Tamás Viszkok, Tamás Aladics, Judit Jász, and Péter Hegedűs. 2020. Deep-water framework: The Swiss army knife of humans working with machine learning models. SoftwareX 12 (2020), 100551.

[10]

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna M. Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for Datasets. CoRR abs/1803.09010 (2018). arxiv:1803.09010http://arxiv.org/abs/1803.09010

[11]

Gharib Gharibi, Vijay Walunj, Rakan Alanazi, Sirisha Rella, and Yugyung Lee. 2019. Automated management of deep learning experiments. In Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning. 1–4.

Digital Library

[12]

Gharib Gharibi, Vijay Walunj, Sirisha Rella, and Yugyung Lee. 2019. Modelkb: towards automated management of the modeling lifecycle in deep learning. In 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). IEEE, 28–34.

Digital Library

[13]

Sorin Mihai Grigorescu, Bogdan Trasnea, Tiberiu T. Cocias, and Gigel Macesanu. 2020. A survey of deep learning techniques for autonomous driving. J. Field Robotics 37, 3 (2020), 362–386.

[14]

Charles Hill, Rachel Bellamy, Thomas Erickson, and Margaret Burnett. 2016. Trials and tribulations of developers of intelligent systems: A field study. In 2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 162–170.

[15]

Matthew Hutson. 2018. Artificial intelligence faces reproducibility crisis. Science (New York, N.Y.) 359 (02 2018), 725–726. https://doi.org/10.1126/science.359.6377.725

[16]

Samuel Idowu, Daniel Strüber, and Thorsten Berger. 2021. Asset Management in Machine Learning: A Survey. In 43rd IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, ICSE (SEIP) 2021, Madrid, Spain, May 25-28, 2021. IEEE, 51–60.

[17]

Samuel Idowu, Daniel Strüber, and Thorsten Berger. 2022. Asset management in machine learning: State-of-research and state-of-practice. Comput. Surveys 55, 7 (2022), 1–35.

Digital Library

[18]

Richard Isdahl and Odd Erik Gundersen. 2019. Out-of-the-box reproducibility: A survey of machine learning platforms. In 2019 15th international conference on eScience (eScience). IEEE, 86–95.

[19]

Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, danah boyd and Jamie H. Morgenstern (Eds.). ACM, 220–229.

Digital Library

[20]

Marçal Mora-Cantallops, Salvador Sánchez-Alonso, Elena García-Barriocanal, and Miguel-Angel Sicilia. 2021. Traceability for trustworthy ai: A review of models and tools. Big Data and Cognitive Computing 5, 2 (2021), 20.

[21]

Alexandru A Ormenisan, Mahmoud Ismail, Seif Haridi, and Jim Dowling. 2020. Implicit provenance for machine learning artifacts. Proceedings of MLSys 20 (2020).

[22]

The pandas development team. 2020. pandas-dev/pandas: Pandas. https://doi.org/10.5281/zenodo.3509134

[23]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. _eprint: 1912.01703.

[24]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research 12, Oct (2011), 2825–2830.

Digital Library

[25]

Behnood Rasti, Alexandre Zouaoui, Julien Mairal, and Jocelyn Chanussot. 2023. Image Processing and Machine Learning for Hyperspectral Unmixing: An Overview and the HySUPP Python Package. arXiv preprint arXiv:2308.09375 (2023).

[26]

Alex Serban, Koen van der Blom, Holger Hoos, and Joost Visser. 2020. Adoption and effects of software engineering best practices in machine learning. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12.

Digital Library

[27]

Koustuv Sinha, Maurits Bleeker, Samarth Bhargav, Jessica Zosa Forde, Sharath Chandra Raparthy, Jesse Dodge, Joelle Pineau, and Robert Stojnic. 2023. ML Reproducibility Challenge 2022. ReScience C 9, 2 (July 2023).

[28]

Shagun Sodhani, Amy Zhang, and Joelle Pineau. 2021. Multi-task reinforcement learning with context-based representations. In International Conference on Machine Learning. PMLR, 9767–9779.

[29]

Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. 2022. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2149–2159.

[30]

Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Daniel MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert. 2022. PDEBench: An extensive benchmark for scientific machine learning. Advances in Neural Information Processing Systems 35 (2022), 1596–1611.

[31]

Rachael Tatman, Jake VanderPlas, and Sohier Dane. 2018. A practical taxonomy of reproducibility for machine learning research.

[32]

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023).

[33]

Jason Tsay, Todd Mummert, Norman Bobroff, Alan Braz, Peter Westerink, and Martin Hirzel. 2018. Runway: machine learning model experiment management tool. In Conference on systems and machine learning (sysML).

[34]

Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel Madden, and Matei Zaharia. 2016. ModelDB: a system for machine learning model management. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. 1–3.

Digital Library

[35]

Thomas Weißgerber and Michael Granitzer. 2019. Mapping platforms into a new open science model for machine learning. it-Information Technology 61, 4 (2019), 197–208.

[36]

Omry Yadan. 2019. Hydra - A framework for elegantly configuring complex applications. https://github.com/facebookresearch/hydra

[37]

Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, 2018. Accelerating the machine learning lifecycle with MLflow.IEEE Data Eng. Bull. 41, 4 (2018), 39–45.

Index Terms

MLXP: A framework for conducting replicable experiments in Python
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software notations and tools
    1. Software libraries and repositories

Recommendations

KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments
ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and Replicability

Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to ...
How to Measure the Reproducibility of System-oriented IR Experiments
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Replicability and reproducibility of experimental results are primary concerns in all the areas of science and IR is not an exception. Besides the problem of moving the field towards more reproducible experimental practices and protocols, we also face a ...
The pos framework: a methodology and toolchain for reproducible network experiments
CoNEXT '21: Proceedings of the 17th International Conference on emerging Networking EXperiments and Technologies

In scientific research, the independent reproduction of experimental results is the source of trust. The release of experimental artifacts enables the reproduction of results; however, additional efforts of researchers are required to prepare and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ACM REP '24: Proceedings of the 2nd ACM Conference on Reproducibility and Replicability

June 2024

151 pages

ISBN:9798400705304

DOI:10.1145/3641525

Copyright © 2024 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

EIGREP: Emerging Interest Group on Reproducibility and Replicability

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Agence Nationale de la Recherche

Conference

ACM REP '24

Sponsor:

EIGREP

ACM REP '24: ACM Conference on Reproducibility and Replicability

June 18 - 20, 2024

Rennes, France

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
25
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)5

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents