[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3647750.3647762acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlscConference Proceedingsconference-collections
research-article
Open access

TensAIR: Real-Time Training of Neural Networks from Data-streams

Published: 12 April 2024 Publication History

Abstract

Online learning (OL) from data streams is an emerging area of research that encompasses numerous challenges from stream processing, machine learning, and networking. Stream-processing platforms, such as Apache Kafka and Flink, have basic extensions for the training of Artificial Neural Networks (ANNs) in a stream-processing pipeline. However, these extensions were not designed to train ANNs in real-time, and they suffer from performance and scalability issues when doing so.
This paper presents TensAIR, the first OL system for training ANNs in real time. TensAIR achieves remarkable performance and scalability by using a decentralized and asynchronous architecture to train ANN models (either freshly initialized or pre-trained) via DASGD (decentralized and asynchronous stochastic gradient descent). We empirically demonstrate that TensAIR achieves a nearly linear scale-out performance in terms of (1) the number of worker nodes deployed in the network, and (2) the throughput at which the data batches arrive at the dataflow operators. We depict the versatility of TensAIR by investigating both sparse (word embedding) and dense (image classification) use cases, for which TensAIR achieved from 6 to 116 times higher sustainable throughput rates than state-of-the-art systems for training ANN in a stream-processing pipeline.

References

[1]
2019. Horovod with Keras. https://horovod.readthedocs.io/en/stable/keras.html Accessed: 2022-05-18.
[2]
2022. Deep Learning on Flink. https://github.com/flink-extended/dl-on-flink Accessed: 2022-08-05.
[3]
2022. Robust machine learning on streaming data using Kafka and Tensorflow-IO. https://www.tensorflow.org/io/tutorials/kafka Accessed: 2022-08-05.
[4]
2022. TensorFlow. https://www.tensorflow.org/text/tutorials/text_classification_rnn Accessed: 2022-05-27.
[5]
Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, out-of-Order Data Processing. Proc. VLDB Endow. 8, 12 (aug 2015), 1792–1803. https://doi.org/10.14778/2824032.2824076
[6]
Roberto Souto Maior Barros and Silas Garrido T Carvalho Santos. 2018. A large-scale comparison of concept drift detectors. Information Sciences 451 (2018), 348–370.
[7]
Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit S. Bedi, and Furong Huang. 2023. SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=jh1nCir1R3d
[8]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).
[9]
Chen Chen, Wei Wang, and Bo Li. 2019. Round-robin synchronization: Mitigating communication bottlenecks in parameter servers. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 532–540.
[10]
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N project report, Stanford 1, 12 (2009).
[11]
Heitor Murilo Gomes, Jesse Read, Albert Bifet, Jean Paul Barddal, and João Gama. 2019. Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explorations Newsletter 21, 2 (2019), 6–22.
[12]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.
[13]
Moritz Heusinger, Christoph Raab, and Frank-Michael Schleif. 2020. Passive concept drift handling via variations of learning vector quantization. Neural Computing and Applications (2020), 1–12.
[14]
Steven CH Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. 2021. Online learning: A comprehensive survey. Neurocomputing 459 (2021), 249–289.
[15]
Hanqing Hu, Mehmed Kantardzic, and Tegjyot S Sethi. 2020. No Free Lunch Theorem for concept drift detection in streaming data classification: A review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10, 2 (2020), e1327.
[16]
Adriana Sayuri Iwashita and Joao Paulo Papa. 2018. An overview on concept drift learning. IEEE access 7 (2018), 1532–1547.
[17]
Jiawei Jiang, Bin Cui, Ce Zhang, and Lele Yu. 2017. Heterogeneity-aware distributed parameter servers. In Proceedings of the 2017 ACM International Conference on Management of Data. 463–478.
[18]
Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, and Volker Markl. 2018. Benchmarking distributed stream data processing systems. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 1507–1518.
[19]
Jay Kreps, Neha Narkhede, Jun Rao, 2011. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB, Vol. 11. 1–7.
[20]
Alex Krizhevsky, Geoffrey Hinton, 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto, Department of Computer Science.
[21]
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD ’15). ACM, New York, NY, USA, 239–250.
[22]
Xiangru Lian, Wei Zhang, Ce Zhang, and Ji Liu. 2018. Asynchronous decentralized parallel stochastic gradient descent. In International Conference on Machine Learning. PMLR, 3043–3052.
[23]
Jie Lu, Anjin Liu, Fan Dong, Feng Gu, Joao Gama, and Guangquan Zhang. 2018. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2018), 2346–2363.
[24]
Ruben Mayer and Hans-Arno Jacobsen. 2020. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Computing Surveys (CSUR) 53, 1 (2020), 1–37.
[25]
Shadi A Noghabi, Kartik Paramasivam, Yi Pan, Navina Ramesh, Jon Bringhurst, Indranil Gupta, and Roy H Campbell. 2017. Samza: stateful scalable stream processing at LinkedIn. Proceedings of the VLDB Endowment 10, 12 (2017), 1634–1645.
[26]
Shuo Ouyang, Dezun Dong, Yemao Xu, and Liquan Xiao. 2021. Communication optimization strategies for distributed deep neural network training: A survey. J. Parallel and Distrib. Comput. 149 (2021), 52–65.
[27]
S Priya and R Annie Uthra. 2021. Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data. Complex & Intelligent Systems (2021), 1–17.
[28]
Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Advances in neural information processing systems 24 (2011).
[29]
Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. The annals of mathematical statistics (1951), 400–407.
[30]
Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018).
[31]
Tegjyot Singh Sethi and Mehmed Kantardzic. 2017. On the reliable detection of concept drift from streaming unlabeled data. Expert Systems with Applications 82 (2017), 77–99.
[32]
Tao Sun, Robert Hannah, and Wotao Yin. 2017. Asynchronous coordinate descent under more realistic assumptions. Advances in Neural Information Processing Systems 30 (2017).
[33]
Mauro DL Tosi, Vinu Ellampallil Venugopal, and Martin Theobald. 2022. Convergence-Time Analysis of Asynchronous Distributed Artificial Neural Networks. In 5th Joint International Conference on Data Science & Management of Data (CODS/COMAD). 314–315.
[34]
Mauro DL Tosi and Martin Theobald. 2023. Convergence Analysis of Decentralized ASGD. arXiv e-prints (2023), arXiv–2309.
[35]
S. Varrette, H. Cartiaux, S. Peter, E. Kieffer, T. Valette, and A. Olloh. 2022. Management of an Academic HPC & Research Computing Facility: The ULHPC Experience 2.0. In Proc. of the 6th ACM High Performance Computing and Cluster Technologies Conf. (HPCCT 2022). Association for Computing Machinery (ACM), Fuzhou, China.
[36]
Vinu E Venugopal, Martin Theobald, Samira Chaychi, and Amal Tawakuli. 2020. AIR: A light-weight yet high-performance dataflow engine based on asynchronous iterative routing. In 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). 51–58.
[37]
Vinu Ellampallil Venugopal, Martin Theobald, Damien Tassetti, Samira Chaychi, and Amal Tawakuli. 2022. Targeting a Light-Weight and Multi-Channel Approach for Distributed Stream Processing. J. Parallel and Distrib. Comput. (2022).
[38]
Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56–65.
[39]
Xin Zhang, Jia Liu, and Zhengyuan Zhu. 2020. Taming convergence for asynchronous stochastic gradient descent with unbounded delay in non-convex learning. In 2020 59th IEEE Conference on Decision and Control (CDC). 3580–3585.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLSC '24: Proceedings of the 2024 8th International Conference on Machine Learning and Soft Computing
January 2024
210 pages
ISBN:9798400716546
DOI:10.1145/3647750
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2024

Check for updates

Author Tags

  1. Asynchronous Stream Processing
  2. Neural Networks
  3. Online Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICMLSC 2024

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 181
    Total Downloads
  • Downloads (Last 12 months)181
  • Downloads (Last 6 weeks)37
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media