Abstract
Artificial Intelligence has regained research interest, primarily because of big data. Internet expansion, social networks and online sensors led to the generation of an enormous amount of information daily. This unprecedented data availability boosted Machine Learning. A research area that has greatly benefited from this fact is Deep Neural Networks. Nowadays many use cases require huge models with millions of parameters and big data are proven to be essential to their proper training. The scientific community has proposed several methods to generate more accurate models. Usually, these methods need high performance infrastructure, which limits their applicability to large organizations and institutions that have the required funds. Another source of concern is privacy; anyone using the leased processing power of a remote data center, must trust another entity with their data. Unfortunately, in many cases sensitive data were leaked, either for financial exploitation or due to security issues. However, there is a lack of research studies when it comes to open communities of individuals with commodity hardware, who wish to join forces in a way that is non-binding and without the need for a central authority. Our work on LEARNAE attempts to fill this gap, by creating a way of providing training in Artificial Neural Networks, featuring decentralization, data ownership and fault tolerance. This article adds some important pieces to the puzzle: It studies the resilience of LEARNAE when dealing with network disruptions and proposes a novel way of embedding low-energy sensors that reside in the Internet of Things domain, retaining at the same time the established distributed philosophy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
For the experiments the following publicly available dataset was used: HEPMASSFootnote 7, dataset for training systems on exotic particle detection.
Code Availability
Code and deployment instructions are available upon request by the authors.
Notes
Apache Spark website, https://spark.apache.org
Bitswap webpage, https://github.com/ipfs/specs/tree/master/bitswap
Project whitepaper, https://iota.org/IOTA_Whitepaper.pdf
Device webpage, https://www.raspberrypi.org/products/raspberry-pi-3-model-b
References
Nikolaidis S, Refanidis I (2019) Learnae: Distributed and resilient deep neural network training for heterogeneous peer to peer topologies. International conference on engineering applications of neural networks pp. 286–298. https://doi.org/10.1007/978-3-030-20257-6_24
Nikolaidis S, Refanidis I (2020) Privacy preserving distributed training of neural networks. Neural Comput & Applic. https://doi.org/10.1007/s00521-020-04880-0
Benet J (2014) IPFS - Content addressed, versioned. P2P File System. arXiv:1407.3561
Popov S, Saa O, Finardi P (2018) Equilibria in the Tangle. arXiv:1712.05385
Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks in Acoustics, Speech and Signal Processing (ICASSP). IEEE International Conference
Miao Y, Zhang H, Metze F (2014) Distributed learning of multilingual dnn feature extractors using gpus
Povey D, Zhang X, Khudanpur S (2014) Parallel training of deep neural networks with natural gradient and parameter averaging
Dean J, Corrado GS, Monga R, Chen K, Devin M, Le QV, Mao M, Razato M, Senior A, Tucker P, Yang K, Ng AY (2012) Large scale distributed deep networks. Advances in neural information processing systems, pp 1223–1231
Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res, pp 165–202
Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. 11th USENIX symposium on operating systems design and implementation pp 583–598
Iandola FN, Ashraf K, Moskewicz MW, Keutzer K (2015) FireCaffe: near-linear acceleration of deep neural network training on compute clusters. arXiv:1511.00175
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. ACM intl Conference on Multimedia, pp 675–678
Feng A, Shi J, Jain M (2016) CaffeOnSpark open sourced for distributed deep learning on big data clusters
Wang Y, Zhang X, Wong I, Dai J, Zhang Y et al (2017) BigDL Programming Guide
Langer M, Hall A, He Z, Rahayu W (2018) MPCA SGD - A method for distributed training of deep learning models on spark. IEEE Transactions on Parallel and Distributed Systems 29:2540–2556. https://doi.org/10.1109/TPDS.2018.2833074
Niu F, Recht B, Re C, Wright SJ (2011) HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. arXiv:1106.5730v2
Dean J, Corrado GS, Monga R, Chen K, Devin M, Le QV, Mao M, Razato M, Senior A, Tucker P, Yang K, Ng AY (2012) Large scale distributed deep networks. Advances in neural information processing systems, pp 1223–1231
Chilimbi T, Suzue Y, Apacible Y, Kalyanaraman K (2014) Project Adam: Building an efficient and scalable deep learning training system. 11th USENIX symposium on operating systems design and implementation 571-582
Zhang S, Choromanska A, LeCun Y (2015) Deep learning with elastic averaging SGD. Advances in neural information processing systems, pp 685–693
Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C (2015) MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. LearningSys
Xing EP, Ho Q, Dai W, Kim JK, Wei J, Lee S, Zheng X, Xie P, Kumar A, Yu Y (2015) Petuum: a new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1:49–67
Shokri R, Shmatikov V (2015) Privacy-Preserving Deep Learning. 22nd ACM SIGSAC 1310-1321
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) TensorFlow: A system for large-scale machine learning. 12th USENIX symposium on operating systems design and implementation 265-283
Moritz P, Nishihara R, Stoica I, Jordan MI (2016) SparkNet: Training deep networks in spark. Intl. conference on learning representations
Lian X, Zhang C, Zhang H, Hsieh CJ, Zhang w, Liu J (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. Advances in neural information processing systems (NIPS)
Blot M, Picard D, Cord M, Thome N (2016) Gossip training for deep learning. arXiv:1611.09726
Boyd S, Ghosh A, Prabhakar B, Shah D (2006) Randomized gossip algorithms. IEEE Trans Inf Theory 52:2508–2530
Kim H, Park J, Jang J, Yoon S (2016) DeepSpark: Spark-Based Deep Learning Supporting Asynchronous Updates and Caffe Compatibility. arXiv:1602.08191
Lian X, Zhang W, Zhang C, Liu J (2018) Asynchronous decentralized parallel stochastic gradient descent International Conference on Machine Learning (ICML)
Coninck E, Bohez S, Leroux S, Verbelen T, Vankeirsbilck B, Simoens P, Dhoedt B (2018) DIANNE: A modular framework for designing, training and deploying deep neural networks on heterogeneous distributed infrastructure. J Syst Softw 141:52–65
Mamidala AR, Kollias G, Ward C, Artico F (2018) MXNET-MPI: Embedding MPI parallelism in parameter server task model for scaling deep learning. arXiv:1801.03855
Sergeev A, Del Balso M (2018) Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799
Peng Y, Zhu Y, Chen Y, Bao Y, Yi B, Lan C, Wu C, Guo C (2019) A generic communication scheduler for distributed DNN training acceleration. Proceedings of the 27th ACM symposium on operating systems principles, pp 16–29
Jiang Y, Zhu Y, Lan C, Yi B, Cui Y, Guo C (2020) A unified architecture for accelerating distributed DNN training in heterogeneous GPU/CPU clusters. 14th USENIX symposium on operating systems design and implementation
Zheng S, Huang Z, Kwok J (2019) Communication-efficient distributed blockwise momentum SGD with error-feedback. Advances in neural information processing systems
Yuan B, Wolfe CR, Dun C, Tang Y, Kyrillidis A, Jermaine CM (2020) Distributed learning of deep neural networks using independent subnet training. arXiv:1910.02120
Shen S, Xu L, Liu J, Liang X, Cheng Y (2019) Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent. arXiv:1906.12043
Wang S, Li D, Geng J (2020) Geryon: Accelerating distributed CNN training by Network-Level flow scheduling. IEEE conference on computer communications, pp 1678–1687
Bao Y, Peng Y, Chen Y, Wu C (2020) Preemptive all-reduce scheduling for expediting distributed DNN training. IEEE Conference on computer communications, pp 626–635
Jayarajan A, Wei J, Gibson G, Fedorova A, Pekhimenko G (2019) Priority-based Parameter Propagation for Distributed DNN Training. arXiv:1905.03960
Sapio A, Canini M, Ho C, Nelson J, Kalnis P, Kim C, Krishnamurthy A, Moshref M, Ports DRK, Richtarik P (2019) Scaling Distributed Machine Learning with In-Network Aggregation. arXiv:1903.06701
Hashemi SH, Jyothi SA, Campbell RH (2018) Communication Scheduling as a First-Class Citizen in Distributed Machine Learning Systems. arXiv:1803.03288
Hsu A, Hu K, Hung J, Suresh A, Zhang Z (2019) TonY: An orchestrator for distributed machine learning jobs. USENIX conference on operational machine learning
Shi S, Zhou X, Song S, Wang X, Zhu Z, Huang X, Jiang X, Zhou F, Guo Z, Xie L, Lan R, Ouyang X, Zhang Y, Wei J, Gong J, Lin W, Gao P, Meng P, Xu X, Guo C, Yang B, Chen Z, Wu Y, Chu X (2020) Towards scalable distributed training of deep learning on public cloud clusters. arXiv:2010.10458
Mashtizadeh AJ, Bittau A, Huang YF, Mazieres D (2013) Replication, history, and grafting in the ori file system. Twenty-fourth ACM symposium on operating systems principles, pp 151–166
Cohen B (2003) Incentives build robustness in bittorrent. Workshop on Economics of Peer-to-Peer systems 6:68–72
Baumgart I, Mies S (2007) S/kademlia: A practicable approach towards secure key based routing. Parallel and Distributed Systems International Conference
Freedman MJ, Freudenthal E, Mazieres D (2004) Democratizing content publication with coral. NSDI 4:18–18
Wang L, Kangasharju J (2013) Measuring large-scale distributed systems: case of bittorrent mainline dht. IEEE Thirteenth International Conference, pp 1–10
Merkle RC (1988) A digital signature based on a conventional encryption function. Advances in cryptology - CRYPTO ’87
Funding
This research is not funded by any source.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nikolaidis, S., Refanidis, I. Using distributed ledger technology to democratize neural network training. Appl Intell 51, 8288–8304 (2021). https://doi.org/10.1007/s10489-021-02340-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02340-3