More Web Proxy on the site http://driver.im/

research-article

Deep learning at 15PF: supervised and semi-supervised classification for scientific data

Authors:

Thorsten Kurth,

Nadathur Satish,

Ioannis Mitliagkas,

Md. Mostofa Ali Patwary,

Narayanan Sundaram,

Mikhail Smorkalov,

Mikhail Shiryaev,

Srinivas Sridharan,

Pradeep DubeyAuthors Info & Claims

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 7, Pages 1 - 11

https://doi.org/10.1145/3126908.3126916

Published: 12 November 2017 Publication History

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Deep learning at 15PF: supervised and semi-supervised classification for scientific data

Pages 1 - 11

Abstract
References

Abstract

This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation obtains ~2TFLOP/s on a single Cori Phase-II Xeon-Phi node. We use a hybrid strategy employing synchronous node-groups, while using asynchronous communication across groups. We use this strategy to scale training of a single model to ~9600 Xeon-Phi nodes; obtaining peak performance of 11.73-15.07 PFLOP/s and sustained performance of 11.41-13.27 PFLOP/s. At scale, our HEP architecture produces state-of-the-art classification accuracy on a dataset with 10M images, exceeding that achieved by selections on high-level physics-motivated features. Our semi-supervised architecture successfully extracts weather patterns in a 15TB climate dataset. Our results demonstrate that Deep Learning can be optimized and scaled effectively on many-core, HPC systems.

References

[1]

C. Peterson, "Track finding with neural networks," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 279, no. 3, pp. 537 -- 545, 1989.

[2]

B. Denby, "Neural networks and cellular automata in experimental high energy physics," Computer Physics Communications, vol. 49, no. 3, pp. 429 -- 448, 1988.

[3]

L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, and A. Schwartzman, "Jet-images - deep learning edition," JHEP, vol. 07, p. 069, 2016.

[4]

P. T. Komiske, E. M. Metodiev, and M. D. Schwartz, "Deep learning in color: towards automated quark/gluon jet discrimination," JHEP, vol. 01, p. 110, 2017.

[5]

The ATLAS collaboration, "Search for massive supersymmetric particles in multi-jet final states produced in pp collisions at {EQUATION} = 13 TeV using the ATLAS detector at the LHC," ATLAS-CONF-2016-057, 2016.

[6]

T. Sjöstrand, S. Mrenna, and P. Skands, "A brief introduction to PYTHIA 8.1," Computer Physics Communications, vol. 178, no. 11, pp. 852 -- 867, 2008.

[7]

J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. LemaÃőtre, A. Mertens, and M. Selvaggi, "DELPHES 3, A modular framework for fast simulation of a generic collider experiment," JHEP, vol. 02, p. 057, 2014.

[8]

M. Cacciari, G. P. Salam, and G. Soyez, "Fastjet user manual," The European Physical Journal C, vol. 72, no. 3, p. 1896, 2012.

[9]

M. Wehner, Prabhat, K. A. Reed, D. Stone, W. D. Collins, and J. Bacmeister, "Resolution dependence of future tropical cyclone projections of cam5.1 in the u.s. clivar hurricane working group idealized configurations," Journal of Climate, vol. 28, no. 10, pp. 3905--3925, 2015.

[10]

T. R. Knutson, J. L. McBride, J. Chan, K. Emanuel, G. Holland, C. Landsea, I. Held, J. P. Kossin, A. Srivastava, and M. Sugi, "Tropical cyclones and climate change," Nature Geoscience, vol. 3, no. 3, pp. 157--163, 2010.

[11]

D. A. Lavers, G. Villarini, R. P. Allan, E. F. Wood, and A. J. Wade, "The detection of atmospheric rivers in atmospheric reanalyses and their links to british winter floods and the large-scale climatic circulation," Journal of Geophysical Research: Atmospheres, vol. 117, no. D20, 2012.

[12]

U. Neu and et al., "Imilast: A community effort to intercompare extratropical cyclone detection and tracking algorithms," Bulletin of the American Meteorological Society, vol. 94, no. 4, pp. 529--547, 2013.

[13]

Y. Liu, E. Racah, Prabhat, J. Correa, A. Khosrowshahi, D. Lavers, K. Kunkel, M. Wehner, and W. D. Collins, "Application of deep convolutional neural networks for detecting extreme weather in climate datasets," CoRR, vol. abs/1605.01156, 2016.

[14]

S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cuDNN: Efficient primitives for deep learning," CoRR, vol. abs/1410.0759, 2014.

[15]

"Introducing DNN primitives in Intel^® Math Kernel Library," https://software.intel.com/en-us/articles/introducing-dnn-primitives-in-intelr-mkl, 2017.

[16]

A. Heinecke, G. Henry, M. Hutchinson, and H. Pabst, "Libxsmm: Accelerating small matrix multiplications by runtime code generation," in Proceedings of SC16. IEEE Press, 2016, pp. 84:1--84:11.

Digital Library

[17]

"Deepbench," github.com/baidu-research/DeepBench, 2017.

[18]

D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos et al., "Deep speech 2 : End-to-end speech recognition in english and mandarin," in Proceedings of ICML), 2016, pp. 173--182.

Digital Library

[19]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al., "Large scale distributed deep networks," in NIPS, 2012, pp. 1223--1231.

Digital Library

[20]

F. N. Iandola, K. Ashraf, M. W. Moskewicz, and K. Keutzer, "Firecaffe: near-linear acceleration of deep neural network training on compute clusters," CoRR, vol. abs/1511.00175, 2015.

[21]

D. Das, S. Avancha, D. Mudigere, K. Vaidyanathan, S. Sridharan, D. D. Kalamkar, B. Kaul, and P. Dubey, "Distributed deep learning using synchronous stochastic gradient descent," CoRR, vol. abs/1602.06709, 2016.

[22]

S. Pathak, P. He, and W. Darling, "Scalable deep document / sequence reasoning with cognitive toolkit," in Proceedings of the 26th International Conference on World Wide Web Companion, ser. WWW '17 Companion. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee, 2017, pp. 931--934. {Online}. Available

Digital Library

[23]

"Scaling Deep Learning on 18,000 GPUs," https://www.nextplatform.com/2017/03/28/scaling-deep-learning-beyond-18000-gpus/, 2017.

[24]

A. Anandkumar. Deep Learning at Scale on AWS. {Online}. Available: https://ml-days-prd.s3.amazonaws.com/slides/speakers/slides/3/Anima-EPFL2017.pdf

[25]

S. Hadjis, C. Zhang, I. Mitliagkas, D. Iter, and C. Ré, "Omnivore: An optimizer for multi-device deep learning on cpus and gpus," arXiv:1606.04487, 2016.

[26]

N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, "On large-batch training for deep learning: Generalization gap and sharp minima," arXiv:1609.04836, 2016.

[27]

J. Tsitsiklis, D. Bertsekas, and M. Athans, "Distributed asynchronous deterministic and stochastic gradient optimization algorithms," IEEE transactions on automatic control, vol. 31 no. 9 pp. 803--812, 1986.

[28]

F. Niu, B. Recht, C. Re, and S. Wright, "Hogwild: A lock-free approach to parallelizing stochastic gradient descent," in NIPS, 2011, pp. 693--701.

Digital Library

[29]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al., "Large scale distributed deep networks," in NIPS, 2012, pp. 1223--1231.

Digital Library

[30]

T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, "Project adam: Building an efficient and scalable deep learning training system," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pp. 571--582.

Digital Library

[31]

I. Mitliagkas, C. Zhang, S. Hadjis, and C. Ré, "Asynchrony begets momentum, with an application to deep learning," arXiv:1605.09774, 2016.

[32]

C. Zhang and C. Re, "Dimmwitted: A study of main-memory statistical analytics," PVLDB, vol. 7, no. 12, pp. 1283--1294, 2014.

Digital Library

[33]

R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung, "Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit," Nature, vol. 405, no. 6789, pp. 947--951, 2000.

[34]

K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in Proceedings of ICCV, 2015, pp. 1026--1034.

Digital Library

[35]

D. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv:1412.6980, 2014.

[36]

E. Racah, C. Beckham, T. Maharaj, C. Pal et al., "Semi-supervised detection of extreme weather events in large climate datasets," arXiv:1612.02095, 2016.

[37]

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in CVPR, 2016, pp. 779--788.

[38]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in European Conference on Computer Vision. Springer, 2016, pp. 21--37.

[39]

S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in NIPS, 2015, pp. 91--99.

Digital Library

[40]

"Intel^® distribution of Caffe*," https://github.com/intel/caffe, 2017.

[41]

"Intel^® Machine Learning Scaling Library for Linux* OS," https://github.com/01org/MLSL, 2017.

[42]

"Intel^® Software Development Emulator," https://software.intel.com/en-us/articles/intel-software-development-emulator, 2017.

[43]

A. Lavin and S. Gray, "Fast algorithms for convolutional neural networks," CoRR, vol. abs/1509.09308, 2015.

[44]

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," CoRR, vol. abs/1609.07061, 2016.

[45]

M. Courbariaux, Y. Bengio, and J. David, "Training deep neural networks with low precision multiplications," CoRR, vol. abs/1412.7024, 2014.

[46]

S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," CoRR, vol. abs/1502.02551, 2015.

[47]

P. Gysel, M. Motamedi, and S. Ghiasi, "Hardware-oriented approximation of convolutional neural networks," CoRR, vol. abs/1604.03168, 2016.

[48]

J. Zhang, I. Mitliagkas, and C. Ré, "Yellowfin and the art of momentum tuning," arXiv preprint arXiv:1706.03471, 2017.

[49]

J. Snoek, H. Larochelle, and R. P. Adams, "Practical bayesian optimization of machine learning algorithms," in NIPS, 2012, pp. 2951--2959.

Digital Library

[50]

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770--778.

[51]

S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735--1780, 1997. {Online}. Available

Digital Library

[52]

F. A. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with lstm," Neural Computation, vol. 12, no. 10, pp. 2451--2471, 2000. {Online}. Available

Digital Library

Cited By

Du JLi DWen YJiang JHuang DLiao XLu Y(2024)SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC SystemsJournal of Computer Science and Technology10.1007/s11390-023-1840-y39:2(384-400)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s11390-023-1840-y
Li WHsu C(2022)GeoAI for Large-Scale Image Analysis and Machine Vision: Recent Progress of Artificial Intelligence in GeographyISPRS International Journal of Geo-Information10.3390/ijgi1107038511:7(385)Online publication date: 11-Jul-2022
https://doi.org/10.3390/ijgi11070385
Ma ZHe JQiu JCao HWang YSun ZZheng LWang HTang SZheng TLin JFeng GHuang ZGao JZeng AZhang JZhong RShi TLiu SZheng WTang JYang HLiu XZhai JChen WLee JAgrawal KSpear M(2022)BaGuaLuProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508417(192-204)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508417
Show More Cited By

Recommendations

Involving CPUs into Multi-GPU Deep Learning
ICPE '18: Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering

The most important part of deep learning, training the neural network, often requires the processing of a large amount of data and can takes days to complete. Data parallelism is widely used for training deep neural networks on multiple GPUs in a single ...
Deep Label Distribution Learning With Label Ambiguity

Convolutional neural networks (ConvNets) have achieved excellent recognition performance in various visual recognition tasks. A large labeled training set is one of the most important factors for its success. However, it is difficult to collect ...
Scaling deep learning on GPU and knights landing clusters
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2017

801 pages

ISBN:9781450351140

DOI:10.1145/3126908

General Chair:
Bernd Mohr
Jülich Supercomputing Center, Jülich, Germany
,
Program Chair:
Padma Raghavan
Vanderbilt University, Nashville, TN

Copyright © 2017 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

SC '17

Sponsor:

SIGHPC

SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 17, 2017

Colorado, Denver

Acceptance Rates

SC '17 Paper Acceptance Rate 61 of 327 submissions, 19%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
1,040
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)9

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Du JLi DWen YJiang JHuang DLiao XLu Y(2024)SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC SystemsJournal of Computer Science and Technology10.1007/s11390-023-1840-y39:2(384-400)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s11390-023-1840-y
Li WHsu C(2022)GeoAI for Large-Scale Image Analysis and Machine Vision: Recent Progress of Artificial Intelligence in GeographyISPRS International Journal of Geo-Information10.3390/ijgi1107038511:7(385)Online publication date: 11-Jul-2022
https://doi.org/10.3390/ijgi11070385
Ma ZHe JQiu JCao HWang YSun ZZheng LWang HTang SZheng TLin JFeng GHuang ZGao JZeng AZhang JZhong RShi TLiu SZheng WTang JYang HLiu XZhai JChen WLee JAgrawal KSpear M(2022)BaGuaLuProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508417(192-204)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508417
Wang KLee SBalewski JSim ANugent PAgrawal AChoudhary AWu KLiao W(2022)Using Multi-Resolution Data to Accelerate Neural Network Training in Scientific Applications2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00050(404-413)Online publication date: May-2022
https://doi.org/10.1109/CCGrid54584.2022.00050
Mittermeier MWeigert MRügamer DKüchenhoff HLudwig R(2022)A deep learning based classification of atmospheric circulation types over Europe: projection of future changes in a CMIP6 large ensembleEnvironmental Research Letters10.1088/1748-9326/ac806817:8(084021)Online publication date: 27-Jul-2022
https://doi.org/10.1088/1748-9326/ac8068
Anzt HCasas MMalossi AQuintana-Ortí EScheidegger FZhuang S(2022)Approximate Computing for Scientific ApplicationsApproximate Computing Techniques10.1007/978-3-030-94705-7_14(415-465)Online publication date: 3-Jan-2022
https://doi.org/10.1007/978-3-030-94705-7_14
Yeo SBae MJeong MKwon OOh S(2022)Crossover‐SGD: A gossip‐based communication in distributed deep learning for alleviating large mini‐batch problem and enhancing scalabilityConcurrency and Computation: Practice and Experience10.1002/cpe.750835:15Online publication date: 19-Dec-2022
https://doi.org/10.1002/cpe.7508
Zhou QGuo SQu ZLi PLi LGuo MWang K(2021)Petrel: Heterogeneity-Aware Distributed Deep Learning Via Hybrid SynchronizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.304060132:5(1030-1043)Online publication date: 1-May-2021
https://doi.org/10.1109/TPDS.2020.3040601
Sukumar SBalma JXu CSerebryakov S(2021)Survival of the Fittest Amidst the Cambrian Explosion of Processor Architectures for Artificial Intelligence : Invited Paper2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)10.1109/PEHC54839.2021.00010(34-43)Online publication date: Nov-2021
https://doi.org/10.1109/PEHC54839.2021.00010
Lee SKang QWang KBalewski JSim AAgrawal AChoudhary ANugent PWu KLiao W(2021)Asynchronous I/O Strategy for Large-Scale Deep Learning Applications2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC53243.2021.00046(322-331)Online publication date: Dec-2021
https://doi.org/10.1109/HiPC53243.2021.00046
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten