More Web Proxy on the site http://driver.im/

announcement

Open access

Deep Learning at Scale and at Ease

Authors:

Tien Tuan Anh Dinh,

Meihui ZhangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 12, Issue 4s

Article No.: 69, Pages 1 - 25

https://doi.org/10.1145/2996464

Published: 02 November 2016 Publication History

Abstract

Recently, deep learning techniques have enjoyed success in various multimedia applications, such as image classification and multimodal data analysis. Large deep learning models are developed for learning rich representations of complex data. There are two challenges to overcome before deep learning can be widely adopted in multimedia and other applications. One is usability, namely the implementation of different models and training algorithms must be done by nonexperts without much effort, especially when the model is large and complex. The other is scalability, namely the deep learning system must be able to provision for a huge demand of computing resources for training large models with massive datasets. To address these two challenges, in this article we design a distributed deep learning platform called SINGA, which has an intuitive programming model based on the common layer abstraction of deep learning models. Good scalability is achieved through flexible distributed training architecture and specific optimization techniques. SINGA runs on both GPUs and CPUs, and we show that it outperforms many other state-of-the-art deep learning systems. Our experience with developing and training deep learning models for real-life multimedia applications in SINGA shows that the platform is both usable and scalable.

References

[1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado et al. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. arXiv:1603.04467. http://tensorflow.org/.

[2]

Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, and Yoshua Bengio. 2012. Theano: New features and speed improvements. In Proceedings of the Deep Learning Workshop (NIPS’12).

[3]

Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016b. Training deep nets with sublinear memory cost. arXiv:1604.06174. http://arxiv.org/abs/1604.06174

[4]

Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274.

[5]

Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Józefowicz. 2016a. Revisiting distributed synchronous SGD. arXiv:1604.00981. http://arxiv.org/abs/1604.00981

[6]

Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. 2014. Project Adam: Building an efficient and scalable deep learning training system. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 571--582. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chilimbi.

Digital Library

[7]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-Tao. Zheng. 2009. NUS-WIDE: A real-world Web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR’09). Article No. 48.

Digital Library

[8]

Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. 2010. Deep big simple neural nets excel on handwritten digit recognition. arXiv:1003.0358.

[9]

Adam Coates, Brody Huval, Tao Wang, David J. Wu, Bryan C. Catanzaro, and Andrew Y. Ng. 2013. Deep learning with COTS HPC systems. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1337--1345.

Digital Library

[10]

R. Collobert, K. Kavukcuoglu, and C. Farabet. 2011. Torch7: A Matlab-like environment for machine learning. In Proceedings of the BigLearn Workshop (NIPS’11).

[11]

Wei Dai, Jinliang Wei, Xun Zheng, Jin Kyu Kim, Seunghak Lee, Junming Yin, Qirong Ho, and Eric P. Xing. 2013. Petuum: A framework for iterative-convergent distributed ML. arXiv:1312.7651. http://arxiv.org/abs/1312.7651

[12]

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew W. Senior, Paul A. Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems (NIPS’12). 1232--1240.

Digital Library

[13]

John C. Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121--2159. http://dl.acm.org/citation.cfm?id=2021068

Digital Library

[14]

Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). 7--16.

Digital Library

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385.

[16]

Geoffrey Hinton and Ruslan Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786, 504--507.

[17]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.

[18]

Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Sai Wu. 2014. epiC: An extensible and scalable system for processing big data. Proceedings of the VLDB Endowment 7, 7, 541--552. http://www.vldb.org/pvldb/vol7/p541-jiang.pdf.

Digital Library

[19]

Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997.

[20]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS’12). 1106--1114.

Digital Library

[21]

Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Greg Corrado, Kai Chen, Jeffrey Dean, and Andrew Y. Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceedings of the International Conference on Machine Learning (ICML’12).

[22]

Yann LeCun, Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. 1996. Efficient BackProp. In Neural Networks: Tricks of the Trade. Springer, 9--50.

Digital Library

[23]

Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 583--598. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu.

Digital Library

[24]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS’13). 3111--3119.

Digital Library

[25]

Tomas Mikolov, Stefan Kombrink, Lukás Burget, Jan Cernocký, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11). IEEE, Los Alamitos, CA, 5528--5531.

[26]

Beng Chin Ooi, Kian-Lee Tan, Sheng Wang, Wei Wang, Qingchao Cai, Gang Chen, Jinyang Gao et al. 2015. SINGA: A distributed deep learning platform. In Proceedings of the ACM Multimedia Conference.

Digital Library

[27]

Thomas Paine, Hailin Jin, Jianchao Yang, Zhe Lin, and Thomas S. Huang. 2013. GPU asynchronous stochastic gradient descent to speed up neural network training. arXiv:1312.6186.

[28]

Benjamin Recht, Christopher Re, Stephen J. Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems (NIPS’11). 693--701.

Digital Library

[29]

Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH’14). 1058--1062.

[30]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. http://arxiv.org/abs/1409.1556

[31]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going deeper with convolutions. arXiv:1409.4842.

[32]

Heng Tao Shen, Beng Chin Ooi, and Kian-Lee Tan. 2000. Giving meanings to WWW images. In Proceedings of the ACM Multimedia Conference. 39--47.

Digital Library

[33]

Kian-Lee Tan, Qingchao Cai, Beng Chin Ooi, Weng-Fai Wong, Chang Yao, and Hao Zhang. 2015. In-memory databases: Challenges and opportunities from software and hardware perspectives. ACM SIGMOD Record 44, 2, 35--40.

Digital Library

[34]

Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the ACM Multimedia Conference. 157--166.

Digital Library

[35]

Xinxi Wang and Ye Wang. 2014. Improving content-based and hybrid music recommendation using deep learning. In Proceedings of the ACM Multimedia Conference. 627--636.

Digital Library

[36]

Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. 2014. Effective multi-modal retrieval based on stacked auto-encoders. Proceedings of the VLDB Endowment 7, 8, 649--660.

Digital Library

[37]

Wei Wang, Gang Chen, Tien Tuan Anh Dinh, Jinyang Gao, Beng Chin Ooi, Kian-Lee Tan, and Sheng Wang. 2015a. SINGA: Putting deep learning in the hands of multimedia users. In Proceedings of the ACM Multimedia Conference.

Digital Library

[38]

Wei Wang, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang, and Yueting Zhuang. 2015b. Effective deep learning-based multi-modal retrieval. VLDB Journal 25, 1, 79--101.

Digital Library

[39]

Ren Wu, Shengen Yan, Yi Shan, Qingqing Dang, and Gang Sun. 2015. Deep Image: Scaling up image recognition. arXiv:1501.02876. http://arxiv.org/abs/1501.02876

[40]

Zuxuan Wu, Yu-Gang Jiang, Jun Wang, Jian Pu, and Xiangyang Xue. 2014. Exploring inter-feature and inter-class relationships with deep neural networks for video classification. In Proceedings of the ACM Multimedia Conference. 167--176.

Digital Library

[41]

Omry Yadan, Keith Adams, Yaniv Taigman, and Marc’Aurelio Ranzato. 2013. Multi-GPU training of ConvNets. arXiv:1312.5853.

[42]

Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Joint visual-textual sentiment analysis with deep neural networks. In Proceedings of the ACM Multimedia Conference. 1071--1074.

Digital Library

[43]

Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Oleksii Kuchaiev, Yu Zhang, Frank Seide et al. 2014. An Introduction to Computational Networks and the Computational Network Toolkit. Microsoft Technical Report MSR-TR-2014-112. Microsoft Research.

[44]

Ce Zhang and Christopher Re. 2014. DimmWitted: A study of main-memory statistical analytics. Proceedings of the VLDB Endowment 7, 12, 1283--1294. http://www.vldb.org/pvldb/vol7/p1283-zhang.pdf.

Digital Library

[45]

Hanwang Zhang, Yang Yang, Huan-Bo Luan, Shuicheng Yang, and Tat-Seng Chua. 2014. Start from scratch: Towards automatically identifying, modeling, and naming visual attributes. In Proceedings of the ACM Multimedia Conference. 187--196.

Digital Library

Cited By

Li PXie PCao Q(2024)SSRAID: A Stripe-Queued and Stripe-Threaded Merging I/O Strategy to Improve Write Performance of Serial Interface SSD RAIDIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.344308335:10(1841-1853)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3443083
Juneja NChandola VZola JWodo ODesai P(2024)Resource Efficient Bayesian Optimization2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00012(12-19)Online publication date: 7-Jul-2024
https://doi.org/10.1109/CLOUD62652.2024.00012
Lee DLee JSong M(2023)Video File Allocation for Wear-Leveling in Distributed Storage Systems With Heterogeneous Solid-State-Disks (SSDs)IEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.322247333:5(2477-2490)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TCSVT.2022.3222473
Show More Cited By

Index Terms

Deep Learning at Scale and at Ease

Recommendations

SINGA: A Distributed Deep Learning Platform
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Deep learning has shown outstanding performance in various machine learning tasks. However, the deep complex model structure and massive training data make it expensive to train. In this paper, we present a distributed deep learning system, called SINGA, ...
SINGA: Putting Deep Learning in the Hands of Multimedia Users
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Recently, deep learning techniques have enjoyed success in various multimedia applications, such as image classification and multi-modal data analysis. Two key factors behind deep learning's remarkable achievement are the immense computing power and the ...
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Availability of large data sets like ImageNet and massively parallel computation support in modern HPC devices like NVIDIA GPUs have fueled a renewed interest in Deep Learning (DL) algorithms. This has triggered the development of DL frameworks like ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 12, Issue 4s

Special Section on Trust Management for Multimedia Big Data and Special Section on Best Papers of ACM Multimedia 2015

November 2016

242 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/2997658

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2016 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2016

Accepted: 01 August 2016

Revised: 01 June 2016

Received: 01 February 2016

Published in TOMM Volume 12, Issue 4s

Check for updates

Author Tags

Qualifiers

Announcement
Research
Refereed

Funding Sources

A*STAR
National Natural Science Foundation of China
National Research Foundation, Prime Minister's Office, Singapore under its Competitive Research Programme
National Research Foundation, Energy Innovation Programme Office, Singapore under Energy Innovation Research Programme

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
1,693
Total Downloads

Downloads (Last 12 months)125
Downloads (Last 6 weeks)15

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li PXie PCao Q(2024)SSRAID: A Stripe-Queued and Stripe-Threaded Merging I/O Strategy to Improve Write Performance of Serial Interface SSD RAIDIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.344308335:10(1841-1853)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3443083
Juneja NChandola VZola JWodo ODesai P(2024)Resource Efficient Bayesian Optimization2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00012(12-19)Online publication date: 7-Jul-2024
https://doi.org/10.1109/CLOUD62652.2024.00012
Lee DLee JSong M(2023)Video File Allocation for Wear-Leveling in Distributed Storage Systems With Heterogeneous Solid-State-Disks (SSDs)IEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.322247333:5(2477-2490)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TCSVT.2022.3222473
Rahimi YRezaei MJafari P(2023)Low-complexity CNN-based CU partitioning for intra framesJournal of Real-Time Image Processing10.1007/s11554-023-01328-120:4Online publication date: 14-Jun-2023
https://dl.acm.org/doi/10.1007/s11554-023-01328-1
Junaid MSohail ATurjman FAli R(2022)Agile Support Vector Machine for Energy-efficient Resource Allocation in IoT-oriented Cloud using PSOACM Transactions on Internet Technology10.1145/343354122:1(1-35)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3433541
Rahimi YRezaei MJafari P(2022)Content-based coding tree unit level rate-quantization model for intra-coding in high efficiency video coding standard using convolutional neural networkJournal of Electronic Imaging10.1117/1.JEI.31.3.03302631:03Online publication date: 1-May-2022
https://doi.org/10.1117/1.JEI.31.3.033026
Yuwono ETjondonegoro DSorwar GAlaei A(2022)Scalability of knowledge distillation in incremental deep learning for fast object detection▪Applied Soft Computing10.1016/j.asoc.2022.109608129:COnline publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1016/j.asoc.2022.109608
Kumar VBakariya B(2022)Detection of Lung Malignancy Using SqueezeNet-Fc Deep Learning Classification TechniqueProceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences10.1007/978-981-16-5747-4_59(683-699)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-981-16-5747-4_59
Li HCheng YRen WZhu T(2020)A Fast Robustness Quantification Method for Evaluating Typical Deep Learning Models by Generally Image Processing2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00040(110-117)Online publication date: Dec-2020
https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00040
Ahn SLim E(2020)SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural NetworksIEEE Access10.1109/ACCESS.2020.30381128(207097-207111)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3038112
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents