[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3332186.3332246acmotherconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Public Access

Publishing and Serving Machine Learning Models with DLHub

Published: 28 July 2019 Publication History

Abstract

In this paper we introduce the Data and Learning Hub for Science (DLHub). DLHub serves as a nexus for publishing, sharing, discovering, and reusing machine learning models. It provides a flexible publication platform that enables researchers to describe and deposit models by associating publication and model-specific metadata and assigning a persistent identifier for subsequent citation. DLHub also supports scalable model inference, allowing researchers to execute inference tasks using a distributed execution engine, containerized models, and Kubernetes. Here we describe DLHub and present four scientific use cases that illustrate how DLHub can be used to reliably, efficiently, and scalably integrate ML into scientific processes.

References

[1]
{n. d.}. Amazon SageMaker. ({n. d.}). https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html. Accessed April 10, 2019.
[2]
{n. d.}. Caffe Model Zoo. ({n. d.}). http://caffe.berkeleyvision.org/model_zoo.html. Accessed April 10, 2019.
[3]
{n. d.}. ModelHub. ({n. d.}). http://modelhub.ai/. Accessed April 10, 2019.
[4]
{n. d.}. ONNX. ({n. d.}). https://github.com/onnx. Accessed April 10, 2019.
[5]
{n. d.}. repo2docker. ({n. d.}). https://repo2docker.readthedocs.io. Accessed April 10, 2019.
[6]
2019. ModelHub. (2019). http://modelhub.ai/. Accessed Febrary 20, 2019.
[7]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A system for large-scale machine learning. In OSDI-16. 265--283.
[8]
Ankit Agrawal, Bryce Meredig, Chris Wolverton, Alok Choudhary, and Computer Science. 2016. A Formation Energy Predictor for Crystalline Materials Using Ensemble Data Mining. Proceedings of IEEE International Conference on Data Mining (ICDM) (2016), 1276--1279.
[9]
Peter Amstutz, Michael R. Crusoe, Nebojša Tijanić, Brad Chapman, John Chilton, Michael Heuer, Andrey Kartashov, Dan Leehr, Hervé Ménager, Maya Nedeljkovich, and et al. 2016. Common Workflow Language, v1.0. (Jul 2016).
[10]
Rachana Ananthakrishnan, Ben Blaiszik, Kyle Chard, Ryan Chard, Brendan McCollam, Jim Pruyne, Stephen Rosen, Steven Tuecke, and Ian Foster. 2018. Globus Platform Services for Data Publication. In Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18). ACM, New York, NY, USA, Article 14, 7 pages.
[11]
Ziga Avsec, Roman Kreuzhuber, Johnny Israeli, Nancy Xu, Jun Cheng, Avanti Shrikumar, Abhimanyu Banerjee, Daniel S Kim, Lara Urban, Anshul Kundaje, Oliver Stegle, and Julien Gagneur. 2018. Kipoi: Accelerating the community exchange and reuse of predictive models for genomics. bioRxiv 10.1101/375345 (2018).
[12]
Yadu Babuji, Anna Woodard, Zhuozhao Li, Daniel Katz, Ben Clifford, Rohan Kumar, Lukasz Lacinski, Ryan Chard, Justin Wozniak, Ian Foster, Michael Wilde, and Kyle Chard. 2019. Parsl: Pervasive Parallel Programming in Python. In ACM International Symposium on High-Performance Parallel and Distributed Computing.
[13]
Ben Blaiszik, Kyle Chard, Jim Pruyne, Rachana Ananthakrishnan, Steven Tuecke, and Ian Foster. 2016. The Materials Data Facility: Data Services to Advance Materials Science Research. Journal of Materials 68, 8 (2016), 2045--2052.
[14]
Ben Blaiszik, Logan Ward, Marcus Schwarting, Ryan Chard, Jonathon Gaff, Daniel Evan Pike, Kyle Chard, and Ian Foster. 2019. A Data Ecosystem to Support Machine Learning in Materials Science. In Materials Research Society, Special Issue Research Letter: Artificial Intelligence.
[15]
Ryan Chard, Zhuozhao Li, Kyle Chard, Logan T. Ward, Yadu N. Babuji, Anna Woodard, Steven Tuecke, Ben Blaiszik, Michael J. Franklin, and Ian T. Foster. 2019. DLHub: Model and data serving for science. In 33rd IEEE International Parallel and Distributed Processing Symposium.
[16]
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 613--627.
[17]
Maarten De Jong, Wei Chen, Thomas Angsten, Anubhav Jain, Randy Notestine, Anthony Gamst, Marcel Sluiter, Chaitanya Krishna Ande, Sybrand Van Der Zwaag, Jose J Plata, et al. 2015. Charting the complete elastic properties of inorganic crystalline compounds. Scientific data 2 (2015), 150009.
[18]
Eric Gossett, Cormac Toher, Corey Oses, Olexandr Isayev, Fleur Legrain, Frisco Rose, Eva Zurek, Jesús Carrete, Natalio Mingo, Alexander Tropsha, and Stefano Curtarolo. 2018. AFLOW-ML: A RESTful API for machine-learning predictions of materials properties. Computational Materials Science 152 (sep 2018), 134--145. arXiv:1711.10744
[19]
Alex Guazzelli, Michael Zeller, Wen-Ching Lin, Graham Williams, et al. 2009. PMML: An open standard for sharing models. The R Journal 1, 1 (2009), 60--65.
[20]
Philip B Holden, Neil R Edwards, Paul H Garthwaite, and Richard D Wilkinson. 2015. Emulation and interpretation of high-dimensional climate model outputs. Journal of Applied Statistics 42, 9 (2015), 2038--2055.
[21]
Thorsten Kurth, Jian Zhang, Nadathur Satish, Evan Racah, Ioannis Mitliagkas, Md. Mostofa Ali Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, and Pradeep Dubey. 2017. Deep Learning at 15PF: Supervised and Semi-supervised Classification for Scientific Data. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 7, 11 pages.
[22]
Yunjie Liu, Evan Racah, Joaquin Correa, Amir Khosrowshahi, David Lavers, Kenneth Kunkel, Michael Wehner, William Collins, et al. 2016. Application of deep convolutional neural networks for detecting extreme weather in climate datasets. arXiv preprint arXiv:1605.01156 (2016).
[23]
Zhengchun Liu, Tekin Bicer, Rajkumar Kettimuthu, Doga Gursoy, Francesco De Carlo, and Ian Foster. 2019. TomoGAN: Low-Dose X-Ray Tomography with Generative Adversarial Networks. iii (2019), 1--17. arXiv:1902.07582 http://arxiv.org/abs/1902.07582
[24]
Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. TensorFlow-Serving: Flexible, high-performance ML serving. In 31st Conf. on Neural Information Processing Systems.
[25]
Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. TensorFlow-Serving: Flexible, high-performance ML serving. arXiv preprint arXiv:1712.06139 (2017).
[26]
Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. TensorFlow-Serving: Flexible, high-performance ML serving. In 31st Conf. on Neural Information Processing Systems.
[27]
Fang Ren, Logan Ward, Travis Williams, Kevin J. Laws, Christopher Wolverton, Jason Hattrick-Simpers, and Apurva Mehta. 2018. Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments. Science Advances 4, 4 (apr 2018), eaaq1566.
[28]
Joan Starr and Angela Gastl. 2011. isCitedBy: A Metadata Scheme for DataCite. D-Lib Magazine 17, 1/2 (jan 2011).
[29]
Helge S Stein, Dan Guevarra, Paul F Newhouse, Edwin Soedarmadji, and John M Gregoire. 2019. Machine learning of optical properties of materials--predicting spectra from images and images from spectra. Chemical Science 10, 1 (2019), 47--55.
[30]
B. Wang, K. Yager, D. Yu, and M. Hoai. 2017. X-Ray Scattering Image Classification Using Deep Learning. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 697--704.
[31]
Justin M Wozniak, Rajeev Jain, Prasanna Balaprakash, Jonathan Ozik, Nicholson Collier, John Bauer, Fangfang Xia, Thomas Brettin, Rick Stevens, Jamaludin Mohd-Yusof, Cristina Garcia Cardona, Brian Van Essen, and Matthew Baughman. 2017. CANDLE/Supervisor: A Workflow Framework for Machine Learning Applied to Cancer Research. In Computational Approaches for Cancer Workshop.

Cited By

View all
  • (2022)Scalable Multi-Versioning Ordered Key-Value Stores with Persistent Memory Support2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00018(93-103)Online publication date: May-2022
  • (2022)Machine learning-based image processing in materials science and engineering: A reviewMaterials Today: Proceedings10.1016/j.matpr.2022.01.20062(7341-7347)Online publication date: 2022
  • (2020)DeepClone: Lightweight State Replication of Deep Learning Models for Data Parallel Training2020 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER49012.2020.00033(226-236)Online publication date: Sep-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
PEARC '19: Practice and Experience in Advanced Research Computing 2019: Rise of the Machines (learning)
July 2019
775 pages
ISBN:9781450372275
DOI:10.1145/3332186
  • General Chair:
  • Tom Furlani
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PEARC '19

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)124
  • Downloads (Last 6 weeks)18
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Scalable Multi-Versioning Ordered Key-Value Stores with Persistent Memory Support2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00018(93-103)Online publication date: May-2022
  • (2022)Machine learning-based image processing in materials science and engineering: A reviewMaterials Today: Proceedings10.1016/j.matpr.2022.01.20062(7341-7347)Online publication date: 2022
  • (2020)DeepClone: Lightweight State Replication of Deep Learning Models for Data Parallel Training2020 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER49012.2020.00033(226-236)Online publication date: Sep-2020
  • (2020)DataStates: Towards Lightweight Data Models for Deep LearningDriving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI10.1007/978-3-030-63393-6_8(117-129)Online publication date: 18-Dec-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media