[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3589806.3600032acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-repConference Proceedingsconference-collections
research-article

KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

Published: 28 June 2023 Publication History

Abstract

Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to supercomputers (the Computing Continuum). Understanding the performance trade-offs of large-scale workflows deployed on such complex Edge-to-Cloud Continuum is challenging. To achieve this, one needs to systematically perform experiments, to enable their reproducibility and allow other researchers to replicate the study and the obtained conclusions on different infrastructures. This breaks down to the tedious process of reconciling the numerous experimental requirements and constraints with low-level infrastructure design choices.
To address the limitations of the main state-of-the-art approaches for distributed, collaborative experimentation, such as Google Colab, Kaggle, and Code Ocean, we propose KheOps, a collaborative environment specifically designed to enable cost-effective reproducibility and replicability of Edge-to-Cloud experiments. KheOps is composed of three core elements: (1) an experiment repository; (2) a notebook environment; and (3) a multi-platform experiment methodology.
We illustrate KheOps with a real-life Edge-to-Cloud application. The evaluations explore the point of view of the authors of an experiment described in an article (who aim to make their experiments reproducible) and the perspective of their readers (who aim to replicate the experiment). The results show how KheOps helps authors to systematically perform repeatable and reproducible experiments on the Grid5000 + FIT IoT LAB testbeds. Furthermore, KheOps helps readers to cost-effectively replicate authors experiments in different infrastructures such as Chameleon Cloud + CHI@Edge testbeds, and obtain the same conclusions with high accuracies (> 88% for all performance metrics).

Supplemental Material

ZIP File
Experiment artifacts for the Artifact Evaluation.

References

[1]
[n. d.]. Artifact Review and Badging Version 1.1.https://www.acm.org/publications/policies/artifact-review-and-badging-current
[2]
[n. d.]. What is Docker Hub?Retrieved Jun 1, 2023 from https://www.docker.com/products/docker-hub/
[3]
2018. Dool (Dstat) monitoring.Retrieved Jan 14, 2023 from https://github.com/scottchiefbaker/dool
[4]
2018. GitHub.Retrieved Jan 14, 2023 from https://github.com/
[5]
2018. Zenodo.Retrieved Jan 14, 2023 from https://zenodo.org/
[6]
2019. E2Clab source code.Retrieved Jan 14, 2023 from https://gitlab.inria.fr/E2Clab/e2clab
[7]
2023. AI Hub.Retrieved Jan 14, 2023 from https://aihub.cloud.google.com/
[8]
2023. Apache Zeppelin.Retrieved Jan 15, 2023 from https://zeppelin.apache.org/
[9]
2023. Code Ocean Explore: Open Science Library.Retrieved Jan 19, 2023 from https://codeocean.com/explore
[10]
2023. Colab: Cloud Storage from the command line.Retrieved Jan 18, 2023 from https://cloud.google.com/storage/docs/gsutil
[11]
2023. Colab: Google Spreadsheets.Retrieved Jan 18, 2023 from https://github.com/burnash/gspread#more-examples
[12]
2023. Compute skylake cluster at [email protected] Feb 16, 2023 from https://www.chameleoncloud.org/hardware/node/sites/tacc/clusters/chameleon/nodes/0b0bceb9-14bf-423e-890f-3ef187511d71/
[13]
2023. Dahu cluster.Retrieved Feb 16, 2023 from https://www.grid5000.fr/w/Grenoble:Hardware#dahu
[14]
2023. Docker.Retrieved Jan 18, 2023 from https://www.docker.com/
[15]
2023. E2Clab User Defined Services.Retrieved Feb 8, 2023 from https://gitlab.inria.fr/E2Clab/user-defined-services
[16]
2023. Experiment artifacts.Retrieved Feb 8, 2023 from https://www.chameleoncloud.org/experiment/share/347adbf3-7c14-4834-b802-b45fdd0d9564
[17]
2023. Experiment results.Retrieved Jan 14, 2023 from https://gitlab.inria.fr/E2Clab/Paper-Artifacts
[18]
2023. Google Colab.Retrieved Jan 17, 2023 from https://colab.research.google.com/
[19]
2023. Google Colab: Frequently Asked Questions.Retrieved Jan 18, 2023 from https://research.google.com/colaboratory/faq.html
[20]
2023. Google Colab vs Kaggle. Retrieved Jan 20, 2023 from https://datasciencenotebook.org/compare/colab/kaggle
[21]
2023. Kaggle community.Retrieved Jan 19, 2023 from https://www.kaggle.com/
[22]
2023. Kaggle datasets.Retrieved Jan 20, 2023 from https://www.kaggle.com/datasets
[23]
2023. MQTT: The Standard for IoT Messaging.Retrieved Feb 16, 2023 from https://mqtt.org/
[24]
2023. Python zlib.Retrieved Feb 16, 2023 from https://docs.python.org/3/library/zlib.html
[25]
2023. Raspberry Pi 3 Model B.Retrieved Feb 16, 2023 from https://www.iot-lab.info/docs/boards/raspberry-pi-3/
[26]
2023. Raspberry Pi 4.Retrieved Feb 16, 2023 from https://chameleoncloud.org/experiment/chiedge/hardware-info/
[27]
2023. SC: The largest Reproducibility Laboratory.Retrieved Feb 8, 2023 from https://www.chameleoncloud.org/blog/2023/02/20/sc-the-largest-reproducibility-laboratory/
[28]
2023. Trovi: Practical Open Reproducibility.Retrieved Jan 20, 2023 from https://chameleoncloud.gitbook.io/trovi/
[29]
2023. Yocto Project.Retrieved Jan 14, 2023 from https://www.yoctoproject.org/
[30]
2023. Zooniverse dataset.Retrieved Feb 16, 2023 from https://www.zooniverse.org/organizations/meredithspalmer/snapshot-safari
[31]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: A system for large-scale machine learning. In 12th { USENIX} symposium on operating systems design and implementation ({ OSDI} 16). 265–283.
[32]
Cedric Adjih, Emmanuel Baccelli, Eric Fleury, Gaetan Harter, Nathalie Mitton, Thomas Noel, Roger Pissard-Gibollet, Frederic Saint-Marcel, Guillaume Schreiner, Julien Vandaele, 2015. FIT IoT-LAB: A large scale open experimental IoT testbed. In 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT). IEEE, 459–464.
[33]
Jason Anderson and Kate Keahey. 2019. A case for integrating experimental containers with notebooks. In 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 151–158.
[34]
L. A. Barba and G. K. Thiruvathukal. 2017. Reproducible Research for Computing in Science Engineering. Computing in Science Engineering 19, 6 (2017), 85–87.
[35]
Raphaël Bolze, Franck Cappello, Eddy Caron, Michel Dayde, Frédéric Desprez, Emmanuel Jeannot, Yvon Jégou, Stephane Lanteri, Julien Leduc, Nouredine Melab, Guillaume Mornet, Raymond Namyst, Pascale Primet, Benjamin Quétier, Olivier Richard, El-Ghazali Talbi, and Iréa Touche. 2006. Grid’5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed. International Journal of High Performance Computing Applications 20, 4 (2006), 481–494. https://doi.org/10.1177/1094342006070078
[36]
Ronan-Alexandre Cherrueau, Marie Delavergne, Alexandre Van Kempen, Adrien Lebre, Dimitri Pertin, Javier Rojas Balderrama, Anthony Simonet, and Matthieu Simonin. 2021. Enoslib: A library for experiment-driven research in distributed computing. IEEE Transactions on Parallel and Distributed Systems 33, 6 (2021), 1464–1477.
[37]
April Clyburne-Sherin, Xu Fei, and Seth Ariel Green. 2019. Computational reproducibility via containers in psychology. Meta-psychology 3 (2019).
[38]
Geoff Cumming, Fiona Fidler, and David L Vaux. 2007. Error bars in experimental biology. The Journal of cell biology 177, 1 (2007), 7–11.
[39]
ETP4HPC. April 29, 2020. ETP4HPC Strategic Research Agenda. https://www.etp4hpc.eu/sra.html.
[40]
Odd Erik Gundersen, Yolanda Gil, and David W Aha. 2018. On reproducible AI: Towards reproducible research, open science, and digital scholarship in AI publications. AI magazine 39, 3 (2018), 56–68.
[41]
Benjamin Haibe-Kains, George Alexandru Adam, Ahmed Hosny, Farnoosh Khodakarami, Massive Analysis Quality Control (MAQC) Society Board of Directors Shraddha Thakkar 35 Kusko Rebecca 36 Sansone Susanna-Assunta 37 Tong Weida 35 Wolfinger Russ D. 38 Mason Christopher E. 39 Jones Wendell 40 Dopazo Joaquin 41 Furlanello Cesare 42, Levi Waldron, Bo Wang, Chris McIntosh, Anna Goldenberg, Anshul Kundaje, 2020. Transparency and reproducibility in artificial intelligence. Nature 586, 7829 (2020), E14–E16.
[42]
Kate Keahey. 2020. The Silver Lining. IEEE Internet Computing 24, 4 (2020), 55–59.
[43]
Kate Keahey, Jason Anderson, Michael Sherman, Zhuo Zhen, Mark Powers, Isabel Brunkan, and Adam Cooper. 2021. Chameleon@Edge Community Workshop Report.
[44]
Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S Gunawi, Cody Hammock, 2020. Lessons learned from the chameleon testbed. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 219–233.
[45]
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows. Vol. 2016.
[46]
Matthew S Krafczyk, A Shi, Adhithya Bhaskar, D Marinov, and Victoria Stodden. 2021. Learning from reproducing computational results: introducing three principles and the Reproduction Package. Philosophical Transactions of the Royal Society A 379, 2197 (2021), 20200069.
[47]
Ling Liu and M Tamer Özsu. 2009. Encyclopedia of database systems. Vol. 6. Springer.
[48]
Engineering National Academies of Sciences, Medicine, 2019. Reproducibility and replicability in science. National Academies Press.
[49]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026–8037.
[50]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825–2830.
[51]
Daniel Rosendo, Alexandru Costan, Gabriel Antoniu, Matthieu Simonin, Jean-Christophe Lombardo, Alexis Joly, and Patrick Valduriez. 2021. Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum. In Cluster 2021 - IEEE International Conference on Cluster Computing. Portland, OR, United States, 23–34. https://doi.org/10.1109/Cluster48925.2021.00043
[52]
Daniel Rosendo, Alexandru Costan, Patrick Valduriez, and Gabriel Antoniu. 2022. Distributed intelligence on the Edge-to-Cloud Continuum: A systematic literature review. Journal of Parallel and Distributed Computing 166 (Aug. 2022), 71–94. https://doi.org/10.1016/j.jpdc.2022.04.004
[53]
Daniel Rosendo, Pedro Silva, Matthieu Simonin, Alexandru Costan, and Gabriel Antoniu. 2020. E2Clab: Exploring the Computing Continuum through Repeatable, Replicable and Reproducible Edge-to-Cloud Experiments. In Cluster 2020 - IEEE International Conference on Cluster Computing. Kobe, Japan, 1–11. https://doi.org/10.1109/CLUSTER49012.2020.00028
[54]
Renan Souza, Vítor Silva, Jose J. Camata, Alvaro L. G. A. Coutinho, Patrick Valduriez, and Marta Mattoso. 2019. Keeping Track of User Steering Actions in Dynamic Workflows. Future Generation Computer Systems 99 (2019), 624–643. https://doi.org/10.1016/j.future.2019.05.011
[55]
Victoria Stodden, Marcia McNutt, David H Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A Heroux, John PA Ioannidis, and Michela Taufer. 2016. Enhancing reproducibility for computational methods. Science 354, 6317 (2016), 1240–1241.
[56]
Victoria Stodden and Sheila Miguez. 2014. Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research. Journal of Open Research Software (Jul 2014). https://openresearchsoftware.metajnl.com/articles/10.5334/jors.ay
[57]
Mark D Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E Bourne, 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data 3, 1 (2016), 1–9.

Cited By

View all
  • (2024)Longevity of Artifacts in Leading Parallel and Distributed Systems Conferences: a Review of the State of the Practice in 2023Proceedings of the 2nd ACM Conference on Reproducibility and Replicability10.1145/3641525.3663631(121-133)Online publication date: 18-Jun-2024

Index Terms

  1. KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and Replicability
        June 2023
        127 pages
        ISBN:9798400701764
        DOI:10.1145/3589806
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 28 June 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Cloud Computing
        2. Computing Continuum
        3. Edge Computing
        4. Repeatability
        5. Replicability
        6. Reproducibility
        7. Workflows

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Data Availability

        Conference

        ACM REP '23
        Sponsor:

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)42
        • Downloads (Last 6 weeks)6
        Reflects downloads up to 11 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Longevity of Artifacts in Leading Parallel and Distributed Systems Conferences: a Review of the State of the Practice in 2023Proceedings of the 2nd ACM Conference on Reproducibility and Replicability10.1145/3641525.3663631(121-133)Online publication date: 18-Jun-2024

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media