More Web Proxy on the site http://driver.im/

research-article

Walk for Learning: A Random Walk Approach for Federated Learning From Heterogeneous Data

Authors:

Venkat Dassari,

Salim El RouayhebAuthors Info & Claims

IEEE Journal on Selected Areas in Communications, Volume 41, Issue 4

Pages 929 - 940

https://doi.org/10.1109/JSAC.2023.3244250

Published: 01 April 2023 Publication History

Abstract

We consider the problem of a Parameter Server (PS) that wishes to learn a model that fits data distributed on the nodes of a graph. We focus on Federated Learning (FL) as a canonical application. One of the main challenges of FL is the communication bottleneck between the nodes and the parameter server. A popular solution in the literature is to allow each node to do several local updates on the model in each iteration before sending it back to the PS. While this mitigates the communication bottleneck, the statistical heterogeneity of the data owned by the different nodes has proven to delay convergence and bias the model. In this work, we study random walk (RW) learning algorithms for tackling the communication and data heterogeneity problems. The main idea is to leverage available direct connections among the nodes themselves, which are typically “cheaper” than the communication to the PS. In a random walk, the model is thought of as a “baton” that is passed from a node to one of its neighbors after being updated in each iteration. The challenge in designing the RW is the data hetErogeneity and the uncertainty about the data distributions. Ideally, we would want to visit more often nodes that hold more informative data. We cast this problem as a sleeping multi-armed bandit (MAB) to design near-optimal node sampling strategy that achieves a variance reduced gradient estimates and approaches sub-linearly the optimal sampling strategy. Based on this framework, we present an adaptive random walk learning algorithm. We provide theoretical guarantees on its convergence. Our numerical results validate our theoretical findings and show that our algorithm outperforms existing random walk algorithms.

References

[1]

H. B. McMahan, E. Moore, D. Ramage, and B. A. Y. Arcas, “Federated learning of deep networks using model averaging,” 2016, arXiv:1602.05629.

[2]

P. Kairouzet al., “Advances and open problems in federated learning,” 2019, arXiv:1912.04977.

[3]

T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, May 2020.

[4]

Y. Yang, R. G. L. D’Oliveira, S. El Rouayheb, X. Yang, H. Seferoglu, and Y. Chen, “Secure coded computation for efficient distributed learning in mobile IoT,” in Proc. 18th Annu. IEEE Int. Conf. Sens., Commun., Netw. (SECON), Jul. 2021, pp. 1–9.

[5]

T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, and W. Shi, “Federated learning of predictive models from federated electronic health records,” Int. J. Med. Inform., vol. 112, pp. 59–67, Apr. 2018.

[6]

J. C. Jiang, B. Kantarci, S. Oktug, and T. Soyata, “Federated learning in smart city sensing: Challenges and opportunities,” Sensors, vol. 20, no. 21, p. 6230, Oct. 2020.

[7]

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. 20th Int. Conf. Artif. Intell. Statist., A. Singh and J. Zhu, Eds. vol. 54, Apr. 2017, pp. 1273–1282. [Online]. Available: https://proceedings.mlr.press/v54/mcmahan17a.html

[8]

Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated learning with non-IID data,” 2018, arXiv:1806.00582.

[9]

J. Wang, Q. Liu, H. Liang, G. Joshi, and H. Vincent Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” 2020, arXiv:2007.07481.

[10]

K. Dolui and S. K. Datta, “Comparison of edge computing implementations: Fog computing, cloudlet and mobile edge computing,” in Proc. Global Internet Things Summit (GIoTS), Jun. 2017, pp. 1–6.

[11]

K. Hsiehet al., “Gaia: Geo-distributed machine learning approaching LAN speeds,” in Proc. NSDI, 2017, pp. 1–21.

[12]

S. Wanget al., “Adaptive federated learning in resource constrained edge computing systems,” IEEE J. Sel. Areas Commun., vol. 37, no. 3, pp. 1205–1221, Jun. 2019.

[13]

A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. Autom. Control, vol. 54, no. 1, pp. 48–61, Jan. 2009.

[14]

B. Johansson, M. Rabi, and M. Johansson, “A randomized incremental subgradient method for distributed optimization in networked systems,” SIAM J. Optim., vol. 20, no. 3, pp. 1157–1170, 2009.

[15]

P. Zhang, J. Wang, and K. Guo, “Compressive sensing and random walk based data collection in wireless sensor networks,” Comput. Commun., vol. 129, pp. 43–53, Sep. 2018.

[16]

X. Mao, K. Yuan, Y. Hu, Y. Gu, A. H. Sayed, and W. Yin, “Walkman: A communication-efficient random-walk algorithm for decentralized optimization,” IEEE Trans. Signal Process., vol. 68, pp. 2513–2528, 2020.

[17]

P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Mach. Learn., vol. 47, no. 2, pp. 235–256, 2002.

Digital Library

[18]

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The nonstochastic multiarmed bandit problem,” SIAM J. Comput., vol. 32, no. 1, pp. 48–77, 2011.

[19]

S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” 2012, arXiv:1204.5721.

[20]

B. Johansson, M. Rabi, and M. Johansson, “A simple peer-to-peer algorithm for distributed optimization in sensor networks,” in Proc. 46th IEEE Conf. Decis. Control, Dec. 2007, pp. 4705–4710.

[21]

S. Sundhar Ram, A. Nedic, and V. V. Veeravalli, “Asynchronous gossip algorithms for stochastic optimization,” in Proc. Int. Conf. Game Theory Netw., May 2009, pp. 80–81.

[22]

H.-T. Wai, W. Shi, A. Nedic, and A. Scaglione, “Curvature-aided incremental aggregated gradient method,” in Proc. 55th Annu. Allerton Conf. Commun., Control, Comput. (Allerton), Oct. 2017, pp. 526–532.

[23]

T. Sun, Y. Sun, and W. Yin, “On Markov chain gradient descent,” in Proc. NeurIPS, 2018, pp. 1–10.

[24]

G. Ayache and S. E. Rouayheb, “Private weighted random walk stochastic gradient descent,” IEEE J. Sel. Areas Inf. Theory, vol. 2, no. 1, pp. 452–463, Mar. 2021.

[25]

J. C. Duchi, A. Agarwal, M. Johansson, and M. I. Jordan, “Ergodic mirror descent,” in Proc. 49th Annu. Allerton Conf. Commun., Control, Comput. (Allerton), Sep. 2011, pp. 701–706.

[26]

W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm for decentralized consensus optimization,” J. Optim., vol. 25, no. 2, pp. 944–966, May 2015.

Digital Library

[27]

A. I. Chen, “Fast distributed first-order methods,” Tech. Rep., 2012.

[28]

D. Jakovetic, J. Xavier, and J. M. F. Moura, “Fast distributed gradient methods,” IEEE Trans. Autom. Control, vol. 59, no. 5, pp. 1131–1146, May 2014.

[29]

A. K. Sahu, T. Li, M. Sanjabi, M. Zaheer, A. S. Talwalkar, and V. Smith, “On the convergence of federated optimization in heterogeneous networks,” 2018, arXiv:1812.06127.

[30]

F. Haddadpour and M. Mahdavi, “On the convergence of local descent methods in federated learning,” 2019, arXiv:1910.14425.

[31]

S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” in Proc. 37th Int. Conf. Mach. Learn., 2020, pp. 5132–5143.

[32]

A. Khaled, K. Mishchenko, and P. Richtárik, “Tighter theory for local SGD on identical and heterogeneous data,” in Proc. AISTATS, 2020, pp. 1–10.

[33]

R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma, “Regret bounds for sleeping experts and bandits,” Mach. Learn., vol. 80, nos. 2–3, pp. 245–272, 2010.

Digital Library

[34]

H. Namkoong, A. Sinha, S. Yadlowsky, and J. C. Duchi, “Adaptive sampling probabilities for non-smooth optimization,” in Proc. ICML, 2017, pp. 1–10.

[35]

F. Salehi, L. Elisa Celis, and P. Thiran, “Stochastic optimization with bandit sampling,” 2017, arXiv:1708.02544.

[36]

Z. Borsos, A. Krause, and K. Y. Levy, “Online variance reduction for stochastic optimization,” 2018, arXiv:1802.04715.

[37]

L. Zhang, S. Lu, and Z. Zhou, “Adaptive online learning in dynamic environments,” in Proc. NeurIPS, 2018, pp. 1–11.

[38]

A. El Hanchi and D. A. Stephens, “Adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes,” 2021, arXiv:2103.12243.

[39]

V. Kanade, H. B. McMahan, and B. Bryan, “Sleeping experts and bandits with stochastic action availability and adversarial rewards,” in Proc. AISTATS, 2009, pp. 272–279.

[40]

A. Saha, P. Gaillard, and M. Valko, “Improved sleeping bandits with stochastic actions sets and adversarial rewards,” in Proc. ICML, 2020, pp. 1–9.

[41]

P. Zhao and T. Zhang, “Stochastic optimization with importance sampling,” 2014, arXiv:1401.2753.

[42]

G. Alain, A. Lamb, C. Sankar, A. Courville, and Y. Bengio, “Variance reduction in SGD by distributed importance sampling,” 2015, arXiv:1511.06481.

[43]

S. U. Stich, A. Raj, and M. Jaggi, “Safe adaptive importance sampling,” in Proc. NIPS, 2017, pp. 1–11.

[44]

G. Papa, P. Bianchi, and S. Clémençon, “Adaptive sampling for incremental optimization using stochastic gradient descent,” in Proc. Int. Conf. Algorithmic Learn. Theory, 2015, pp. 317–331.

[45]

D. Needell, R. Ward, and N. Srebro, “Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm,” in Advances in Neural Information Processing Systems, vol. 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Red Hook, NY, USA: Curran Associates, 2014, pp. 1017–1025.

Digital Library

[46]

L. Bottou and O. Bousquet, “The tradeoffs of large scale learning,” in Proc. NIPS, 2007, pp. 1–8.

[47]

A. Defazio, F. Bach, and S. Lacoste-Julien, “SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives,” in Proc. NIPS, 2014, pp. 1–9.

[48]

L. Xiao and T. Zhang, “A proximal stochastic gradient method with progressive variance reduction,” SIAM J. Optim., vol. 24, no. 4, pp. 2057–2075, 2014.

Digital Library

[49]

G. Bouchard, T. Trouillon, J. Perez, and A. Gaidon, “Online learning to sample,” 2015, arXiv:1506.09016.

[50]

T. Lattimore and C. Szepesvari, Bandit Algorithms, 2020.

[51]

C.-C. Huang, “Non-homogeneous Markov chains and their applications,” Ph.D. dissertations, Dept. Math., Lowa State Univ., Ames, IA, USA, 1977.

[52]

G. Ayache and S. El Rouayheb, “Random walk gradient descent for decentralized learning on graphs,” in Proc. IEEE Int. Parallel Distrib. Process. Symp. Workshops (IPDPSW), May 2019, pp. 926–931.

[53]

A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network structure, dynamics, and function using networkx,” in Proc. 7th Python Sci. Conf., G. Varoquaux, T. Vaught, and J. Millman, Eds. Pasadena, CA, USA, 2008, pp. 11–15.

[54]

Y. Nesterovet al., Lectures on Convex Optimization, vol. 137. Cham, Switzerland: Springer, 2018.

[55]

J. C. Dattorro, Convex Optimization and Euclidean Distance Geometry, 2004.

[56]

D. A. Levin, Y. Peres, and E. L. Wilmer, Markov Chains and Mixing Times. Providence, RI, USA: American Mathematical Society, 2006.

Cited By

Luo LZhang CYu HLi ZSun GLuo S(2024)Energy-Efficient Hierarchical Collaborative Learning Over LEO Satellite ConstellationsIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.345902142:12(3366-3379)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1109/JSAC.2024.3459021

Recommendations

Quantum random walk polynomial and quantum random walk measure

In the paper, we introduce a quantum random walk polynomial (QRWP) that can be defined as a polynomial $$\{P_{n}(x)\}$${Pn(x)}, which is orthogonal with respect to a quantum random walk measure (QRWM) on $$[-1, 1]$$[-1,1], such that the parameters $$\...
Diffusivity of a random walk on random walks

We consider a random walk Zn1,',ZnK+1∈ï K+1 with the constraint that each coordinate of the walk is at distance one from the following one. In this paper, we show that this random walk is slowed down by a variance factor ï K2=2K+2 with respect to the ...
New results for random walk learning

In a very strong positive result for passive learning algorithms, Bshouty et al. showed that DNF expressions are efficiently learnable in the uniform random walk model. It is natural to ask whether the more expressive class of thresholds of parities (TOP)...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Journal on Selected Areas in Communications

IEEE Journal on Selected Areas in Communications Volume 41, Issue 4

April 2023

414 pages

ISSN:0733-8716

Issue’s Table of Contents

0733-8716 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 April 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Luo LZhang CYu HLi ZSun GLuo S(2024)Energy-Efficient Hierarchical Collaborative Learning Over LEO Satellite ConstellationsIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.345902142:12(3366-3379)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1109/JSAC.2024.3459021

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents