More Web Proxy on the site http://driver.im/

research-article

Open access

TensAIR: Real-Time Training of Neural Networks from Data-streams

Authors:

Mauro D. L. Tosi,

Vinu E. Venugopal,

Martin TheobaldAuthors Info & Claims

ICMLSC '24: Proceedings of the 2024 8th International Conference on Machine Learning and Soft Computing

Pages 73 - 82

https://doi.org/10.1145/3647750.3647762

Published: 12 April 2024 Publication History

All formats PDF

Abstract

Online learning (OL) from data streams is an emerging area of research that encompasses numerous challenges from stream processing, machine learning, and networking. Stream-processing platforms, such as Apache Kafka and Flink, have basic extensions for the training of Artificial Neural Networks (ANNs) in a stream-processing pipeline. However, these extensions were not designed to train ANNs in real-time, and they suffer from performance and scalability issues when doing so.

This paper presents TensAIR, the first OL system for training ANNs in real time. TensAIR achieves remarkable performance and scalability by using a decentralized and asynchronous architecture to train ANN models (either freshly initialized or pre-trained) via DASGD (decentralized and asynchronous stochastic gradient descent). We empirically demonstrate that TensAIR achieves a nearly linear scale-out performance in terms of (1) the number of worker nodes deployed in the network, and (2) the throughput at which the data batches arrive at the dataflow operators. We depict the versatility of TensAIR by investigating both sparse (word embedding) and dense (image classification) use cases, for which TensAIR achieved from 6 to 116 times higher sustainable throughput rates than state-of-the-art systems for training ANN in a stream-processing pipeline.

References

[1]

2019. Horovod with Keras. https://horovod.readthedocs.io/en/stable/keras.html Accessed: 2022-05-18.

[2]

2022. Deep Learning on Flink. https://github.com/flink-extended/dl-on-flink Accessed: 2022-08-05.

[3]

2022. Robust machine learning on streaming data using Kafka and Tensorflow-IO. https://www.tensorflow.org/io/tutorials/kafka Accessed: 2022-08-05.

[4]

2022. TensorFlow. https://www.tensorflow.org/text/tutorials/text_classification_rnn Accessed: 2022-05-27.

[5]

Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, out-of-Order Data Processing. Proc. VLDB Endow. 8, 12 (aug 2015), 1792–1803. https://doi.org/10.14778/2824032.2824076

Digital Library

[6]

Roberto Souto Maior Barros and Silas Garrido T Carvalho Santos. 2018. A large-scale comparison of concept drift detectors. Information Sciences 451 (2018), 348–370.

[7]

Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit S. Bedi, and Furong Huang. 2023. SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=jh1nCir1R3d

[8]

Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).

[9]

Chen Chen, Wei Wang, and Bo Li. 2019. Round-robin synchronization: Mitigating communication bottlenecks in parameter servers. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 532–540.

Digital Library

[10]

Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N project report, Stanford 1, 12 (2009).

[11]

Heitor Murilo Gomes, Jesse Read, Albert Bifet, Jean Paul Barddal, and João Gama. 2019. Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explorations Newsletter 21, 2 (2019), 6–22.

Digital Library

[12]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.

Digital Library

[13]

Moritz Heusinger, Christoph Raab, and Frank-Michael Schleif. 2020. Passive concept drift handling via variations of learning vector quantization. Neural Computing and Applications (2020), 1–12.

[14]

Steven CH Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. 2021. Online learning: A comprehensive survey. Neurocomputing 459 (2021), 249–289.

Digital Library

[15]

Hanqing Hu, Mehmed Kantardzic, and Tegjyot S Sethi. 2020. No Free Lunch Theorem for concept drift detection in streaming data classification: A review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10, 2 (2020), e1327.

[16]

Adriana Sayuri Iwashita and Joao Paulo Papa. 2018. An overview on concept drift learning. IEEE access 7 (2018), 1532–1547.

[17]

Jiawei Jiang, Bin Cui, Ce Zhang, and Lele Yu. 2017. Heterogeneity-aware distributed parameter servers. In Proceedings of the 2017 ACM International Conference on Management of Data. 463–478.

Digital Library

[18]

Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, and Volker Markl. 2018. Benchmarking distributed stream data processing systems. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 1507–1518.

[19]

Jay Kreps, Neha Narkhede, Jun Rao, 2011. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB, Vol. 11. 1–7.

[20]

Alex Krizhevsky, Geoffrey Hinton, 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto, Department of Computer Science.

[21]

Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD ’15). ACM, New York, NY, USA, 239–250.

Digital Library

[22]

Xiangru Lian, Wei Zhang, Ce Zhang, and Ji Liu. 2018. Asynchronous decentralized parallel stochastic gradient descent. In International Conference on Machine Learning. PMLR, 3043–3052.

[23]

Jie Lu, Anjin Liu, Fan Dong, Feng Gu, Joao Gama, and Guangquan Zhang. 2018. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2018), 2346–2363.

[24]

Ruben Mayer and Hans-Arno Jacobsen. 2020. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Computing Surveys (CSUR) 53, 1 (2020), 1–37.

Digital Library

[25]

Shadi A Noghabi, Kartik Paramasivam, Yi Pan, Navina Ramesh, Jon Bringhurst, Indranil Gupta, and Roy H Campbell. 2017. Samza: stateful scalable stream processing at LinkedIn. Proceedings of the VLDB Endowment 10, 12 (2017), 1634–1645.

Digital Library

[26]

Shuo Ouyang, Dezun Dong, Yemao Xu, and Liquan Xiao. 2021. Communication optimization strategies for distributed deep neural network training: A survey. J. Parallel and Distrib. Comput. 149 (2021), 52–65.

[27]

S Priya and R Annie Uthra. 2021. Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data. Complex & Intelligent Systems (2021), 1–17.

[28]

Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Advances in neural information processing systems 24 (2011).

[29]

Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. The annals of mathematical statistics (1951), 400–407.

[30]

Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018).

[31]

Tegjyot Singh Sethi and Mehmed Kantardzic. 2017. On the reliable detection of concept drift from streaming unlabeled data. Expert Systems with Applications 82 (2017), 77–99.

Digital Library

[32]

Tao Sun, Robert Hannah, and Wotao Yin. 2017. Asynchronous coordinate descent under more realistic assumptions. Advances in Neural Information Processing Systems 30 (2017).

[33]

Mauro DL Tosi, Vinu Ellampallil Venugopal, and Martin Theobald. 2022. Convergence-Time Analysis of Asynchronous Distributed Artificial Neural Networks. In 5th Joint International Conference on Data Science & Management of Data (CODS/COMAD). 314–315.

[34]

Mauro DL Tosi and Martin Theobald. 2023. Convergence Analysis of Decentralized ASGD. arXiv e-prints (2023), arXiv–2309.

[35]

S. Varrette, H. Cartiaux, S. Peter, E. Kieffer, T. Valette, and A. Olloh. 2022. Management of an Academic HPC & Research Computing Facility: The ULHPC Experience 2.0. In Proc. of the 6th ACM High Performance Computing and Cluster Technologies Conf. (HPCCT 2022). Association for Computing Machinery (ACM), Fuzhou, China.

[36]

Vinu E Venugopal, Martin Theobald, Samira Chaychi, and Amal Tawakuli. 2020. AIR: A light-weight yet high-performance dataflow engine based on asynchronous iterative routing. In 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). 51–58.

[37]

Vinu Ellampallil Venugopal, Martin Theobald, Damien Tassetti, Samira Chaychi, and Amal Tawakuli. 2022. Targeting a Light-Weight and Multi-Channel Approach for Distributed Stream Processing. J. Parallel and Distrib. Comput. (2022).

[38]

Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56–65.

Digital Library

[39]

Xin Zhang, Jia Liu, and Zhengyuan Zhu. 2020. Taming convergence for asynchronous stochastic gradient descent with unbounded delay in non-convex learning. In 2020 59th IEEE Conference on Decision and Control (CDC). 3580–3585.

Digital Library

Recommendations

Efficient and reliable training of neural networks
HSI'09: Proceedings of the 2nd conference on Human System Interactions

This paper introduces a neural network training tool, NBN 2.0, which is developed based on neuron by neuron computing method [1][2]. Error backpropagation (EBP) algorithm, Levenberg Marquardt (LM) algorithm and its improved versions are implemented in ...
A review of online learning in supervised neural networks

Learning in neural networks can broadly be divided into two categories, viz., off-line (or batch) learning and online (or incremental) learning. In this paper, a review of a variety of supervised neural networks with online learning capabilities is ...
Artificial neural networks: learning algorithms, performance evaluation, and applications

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLSC '24: Proceedings of the 2024 8th International Conference on Machine Learning and Soft Computing

January 2024

210 pages

ISBN:9798400716546

DOI:10.1145/3647750

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Fonds National de la Recherche Luxembourg

Conference

ICMLSC 2024

ICMLSC 2024: 2024 The 8th International Conference on Machine Learning and Soft Computing

January 26 - 28, 2024

Singapore, Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
218
Total Downloads

Downloads (Last 12 months)218
Downloads (Last 6 weeks)34

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten