[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3126908.3126916acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Deep learning at 15PF: supervised and semi-supervised classification for scientific data

Published: 12 November 2017 Publication History

Abstract

This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation obtains ~2TFLOP/s on a single Cori Phase-II Xeon-Phi node. We use a hybrid strategy employing synchronous node-groups, while using asynchronous communication across groups. We use this strategy to scale training of a single model to ~9600 Xeon-Phi nodes; obtaining peak performance of 11.73-15.07 PFLOP/s and sustained performance of 11.41-13.27 PFLOP/s. At scale, our HEP architecture produces state-of-the-art classification accuracy on a dataset with 10M images, exceeding that achieved by selections on high-level physics-motivated features. Our semi-supervised architecture successfully extracts weather patterns in a 15TB climate dataset. Our results demonstrate that Deep Learning can be optimized and scaled effectively on many-core, HPC systems.

References

[1]
C. Peterson, "Track finding with neural networks," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 279, no. 3, pp. 537 -- 545, 1989.
[2]
B. Denby, "Neural networks and cellular automata in experimental high energy physics," Computer Physics Communications, vol. 49, no. 3, pp. 429 -- 448, 1988.
[3]
L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, and A. Schwartzman, "Jet-images - deep learning edition," JHEP, vol. 07, p. 069, 2016.
[4]
P. T. Komiske, E. M. Metodiev, and M. D. Schwartz, "Deep learning in color: towards automated quark/gluon jet discrimination," JHEP, vol. 01, p. 110, 2017.
[5]
The ATLAS collaboration, "Search for massive supersymmetric particles in multi-jet final states produced in pp collisions at {EQUATION} = 13 TeV using the ATLAS detector at the LHC," ATLAS-CONF-2016-057, 2016.
[6]
T. Sjöstrand, S. Mrenna, and P. Skands, "A brief introduction to PYTHIA 8.1," Computer Physics Communications, vol. 178, no. 11, pp. 852 -- 867, 2008.
[7]
J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. LemaÃőtre, A. Mertens, and M. Selvaggi, "DELPHES 3, A modular framework for fast simulation of a generic collider experiment," JHEP, vol. 02, p. 057, 2014.
[8]
M. Cacciari, G. P. Salam, and G. Soyez, "Fastjet user manual," The European Physical Journal C, vol. 72, no. 3, p. 1896, 2012.
[9]
M. Wehner, Prabhat, K. A. Reed, D. Stone, W. D. Collins, and J. Bacmeister, "Resolution dependence of future tropical cyclone projections of cam5.1 in the u.s. clivar hurricane working group idealized configurations," Journal of Climate, vol. 28, no. 10, pp. 3905--3925, 2015.
[10]
T. R. Knutson, J. L. McBride, J. Chan, K. Emanuel, G. Holland, C. Landsea, I. Held, J. P. Kossin, A. Srivastava, and M. Sugi, "Tropical cyclones and climate change," Nature Geoscience, vol. 3, no. 3, pp. 157--163, 2010.
[11]
D. A. Lavers, G. Villarini, R. P. Allan, E. F. Wood, and A. J. Wade, "The detection of atmospheric rivers in atmospheric reanalyses and their links to british winter floods and the large-scale climatic circulation," Journal of Geophysical Research: Atmospheres, vol. 117, no. D20, 2012.
[12]
U. Neu and et al., "Imilast: A community effort to intercompare extratropical cyclone detection and tracking algorithms," Bulletin of the American Meteorological Society, vol. 94, no. 4, pp. 529--547, 2013.
[13]
Y. Liu, E. Racah, Prabhat, J. Correa, A. Khosrowshahi, D. Lavers, K. Kunkel, M. Wehner, and W. D. Collins, "Application of deep convolutional neural networks for detecting extreme weather in climate datasets," CoRR, vol. abs/1605.01156, 2016.
[14]
S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cuDNN: Efficient primitives for deep learning," CoRR, vol. abs/1410.0759, 2014.
[15]
"Introducing DNN primitives in Intel® Math Kernel Library," https://software.intel.com/en-us/articles/introducing-dnn-primitives-in-intelr-mkl, 2017.
[16]
A. Heinecke, G. Henry, M. Hutchinson, and H. Pabst, "Libxsmm: Accelerating small matrix multiplications by runtime code generation," in Proceedings of SC16. IEEE Press, 2016, pp. 84:1--84:11.
[17]
"Deepbench," github.com/baidu-research/DeepBench, 2017.
[18]
D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos et al., "Deep speech 2 : End-to-end speech recognition in english and mandarin," in Proceedings of ICML), 2016, pp. 173--182.
[19]
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al., "Large scale distributed deep networks," in NIPS, 2012, pp. 1223--1231.
[20]
F. N. Iandola, K. Ashraf, M. W. Moskewicz, and K. Keutzer, "Firecaffe: near-linear acceleration of deep neural network training on compute clusters," CoRR, vol. abs/1511.00175, 2015.
[21]
D. Das, S. Avancha, D. Mudigere, K. Vaidyanathan, S. Sridharan, D. D. Kalamkar, B. Kaul, and P. Dubey, "Distributed deep learning using synchronous stochastic gradient descent," CoRR, vol. abs/1602.06709, 2016.
[22]
S. Pathak, P. He, and W. Darling, "Scalable deep document / sequence reasoning with cognitive toolkit," in Proceedings of the 26th International Conference on World Wide Web Companion, ser. WWW '17 Companion. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee, 2017, pp. 931--934. {Online}. Available
[23]
"Scaling Deep Learning on 18,000 GPUs," https://www.nextplatform.com/2017/03/28/scaling-deep-learning-beyond-18000-gpus/, 2017.
[24]
A. Anandkumar. Deep Learning at Scale on AWS. {Online}. Available: https://ml-days-prd.s3.amazonaws.com/slides/speakers/slides/3/Anima-EPFL2017.pdf
[25]
S. Hadjis, C. Zhang, I. Mitliagkas, D. Iter, and C. Ré, "Omnivore: An optimizer for multi-device deep learning on cpus and gpus," arXiv:1606.04487, 2016.
[26]
N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, "On large-batch training for deep learning: Generalization gap and sharp minima," arXiv:1609.04836, 2016.
[27]
J. Tsitsiklis, D. Bertsekas, and M. Athans, "Distributed asynchronous deterministic and stochastic gradient optimization algorithms," IEEE transactions on automatic control, vol. 31 no. 9 pp. 803--812, 1986.
[28]
F. Niu, B. Recht, C. Re, and S. Wright, "Hogwild: A lock-free approach to parallelizing stochastic gradient descent," in NIPS, 2011, pp. 693--701.
[29]
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al., "Large scale distributed deep networks," in NIPS, 2012, pp. 1223--1231.
[30]
T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, "Project adam: Building an efficient and scalable deep learning training system," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pp. 571--582.
[31]
I. Mitliagkas, C. Zhang, S. Hadjis, and C. Ré, "Asynchrony begets momentum, with an application to deep learning," arXiv:1605.09774, 2016.
[32]
C. Zhang and C. Re, "Dimmwitted: A study of main-memory statistical analytics," PVLDB, vol. 7, no. 12, pp. 1283--1294, 2014.
[33]
R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung, "Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit," Nature, vol. 405, no. 6789, pp. 947--951, 2000.
[34]
K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in Proceedings of ICCV, 2015, pp. 1026--1034.
[35]
D. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv:1412.6980, 2014.
[36]
E. Racah, C. Beckham, T. Maharaj, C. Pal et al., "Semi-supervised detection of extreme weather events in large climate datasets," arXiv:1612.02095, 2016.
[37]
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in CVPR, 2016, pp. 779--788.
[38]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in European Conference on Computer Vision. Springer, 2016, pp. 21--37.
[39]
S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in NIPS, 2015, pp. 91--99.
[40]
"Intel® distribution of Caffe*," https://github.com/intel/caffe, 2017.
[41]
"Intel® Machine Learning Scaling Library for Linux* OS," https://github.com/01org/MLSL, 2017.
[42]
"Intel® Software Development Emulator," https://software.intel.com/en-us/articles/intel-software-development-emulator, 2017.
[43]
A. Lavin and S. Gray, "Fast algorithms for convolutional neural networks," CoRR, vol. abs/1509.09308, 2015.
[44]
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," CoRR, vol. abs/1609.07061, 2016.
[45]
M. Courbariaux, Y. Bengio, and J. David, "Training deep neural networks with low precision multiplications," CoRR, vol. abs/1412.7024, 2014.
[46]
S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," CoRR, vol. abs/1502.02551, 2015.
[47]
P. Gysel, M. Motamedi, and S. Ghiasi, "Hardware-oriented approximation of convolutional neural networks," CoRR, vol. abs/1604.03168, 2016.
[48]
J. Zhang, I. Mitliagkas, and C. Ré, "Yellowfin and the art of momentum tuning," arXiv preprint arXiv:1706.03471, 2017.
[49]
J. Snoek, H. Larochelle, and R. P. Adams, "Practical bayesian optimization of machine learning algorithms," in NIPS, 2012, pp. 2951--2959.
[50]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770--778.
[51]
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735--1780, 1997. {Online}. Available
[52]
F. A. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with lstm," Neural Computation, vol. 12, no. 10, pp. 2451--2471, 2000. {Online}. Available

Cited By

View all
  • (2024)SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC SystemsJournal of Computer Science and Technology10.1007/s11390-023-1840-y39:2(384-400)Online publication date: 1-Mar-2024
  • (2022)GeoAI for Large-Scale Image Analysis and Machine Vision: Recent Progress of Artificial Intelligence in GeographyISPRS International Journal of Geo-Information10.3390/ijgi1107038511:7(385)Online publication date: 11-Jul-2022
  • (2022)BaGuaLuProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508417(192-204)Online publication date: 2-Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2017
801 pages
ISBN:9781450351140
DOI:10.1145/3126908
  • General Chair:
  • Bernd Mohr,
  • Program Chair:
  • Padma Raghavan
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC '17
Sponsor:

Acceptance Rates

SC '17 Paper Acceptance Rate 61 of 327 submissions, 19%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC SystemsJournal of Computer Science and Technology10.1007/s11390-023-1840-y39:2(384-400)Online publication date: 1-Mar-2024
  • (2022)GeoAI for Large-Scale Image Analysis and Machine Vision: Recent Progress of Artificial Intelligence in GeographyISPRS International Journal of Geo-Information10.3390/ijgi1107038511:7(385)Online publication date: 11-Jul-2022
  • (2022)BaGuaLuProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508417(192-204)Online publication date: 2-Apr-2022
  • (2022)Using Multi-Resolution Data to Accelerate Neural Network Training in Scientific Applications2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00050(404-413)Online publication date: May-2022
  • (2022)A deep learning based classification of atmospheric circulation types over Europe: projection of future changes in a CMIP6 large ensembleEnvironmental Research Letters10.1088/1748-9326/ac806817:8(084021)Online publication date: 27-Jul-2022
  • (2022)Approximate Computing for Scientific ApplicationsApproximate Computing Techniques10.1007/978-3-030-94705-7_14(415-465)Online publication date: 3-Jan-2022
  • (2022)Crossover‐SGD: A gossip‐based communication in distributed deep learning for alleviating large mini‐batch problem and enhancing scalabilityConcurrency and Computation: Practice and Experience10.1002/cpe.750835:15Online publication date: 19-Dec-2022
  • (2021)Petrel: Heterogeneity-Aware Distributed Deep Learning Via Hybrid SynchronizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.304060132:5(1030-1043)Online publication date: 1-May-2021
  • (2021)Survival of the Fittest Amidst the Cambrian Explosion of Processor Architectures for Artificial Intelligence : Invited Paper2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)10.1109/PEHC54839.2021.00010(34-43)Online publication date: Nov-2021
  • (2021)Asynchronous I/O Strategy for Large-Scale Deep Learning Applications2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC53243.2021.00046(322-331)Online publication date: Dec-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media