[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Walk for Learning: A Random Walk Approach for Federated Learning From Heterogeneous Data

Published: 01 April 2023 Publication History

Abstract

We consider the problem of a Parameter Server (PS) that wishes to learn a model that fits data distributed on the nodes of a graph. We focus on Federated Learning (FL) as a canonical application. One of the main challenges of FL is the communication bottleneck between the nodes and the parameter server. A popular solution in the literature is to allow each node to do several local updates on the model in each iteration before sending it back to the PS. While this mitigates the communication bottleneck, the statistical heterogeneity of the data owned by the different nodes has proven to delay convergence and bias the model. In this work, we study random walk (RW) learning algorithms for tackling the communication and data heterogeneity problems. The main idea is to leverage available direct connections among the nodes themselves, which are typically “cheaper” than the communication to the PS. In a random walk, the model is thought of as a “baton” that is passed from a node to one of its neighbors after being updated in each iteration. The challenge in designing the RW is the data hetErogeneity and the uncertainty about the data distributions. Ideally, we would want to visit more often nodes that hold more informative data. We cast this problem as a sleeping multi-armed bandit (MAB) to design near-optimal node sampling strategy that achieves a variance reduced gradient estimates and approaches sub-linearly the optimal sampling strategy. Based on this framework, we present an adaptive random walk learning algorithm. We provide theoretical guarantees on its convergence. Our numerical results validate our theoretical findings and show that our algorithm outperforms existing random walk algorithms.

References

[1]
H. B. McMahan, E. Moore, D. Ramage, and B. A. Y. Arcas, “Federated learning of deep networks using model averaging,” 2016, arXiv:1602.05629.
[2]
P. Kairouzet al., “Advances and open problems in federated learning,” 2019, arXiv:1912.04977.
[3]
T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, May 2020.
[4]
Y. Yang, R. G. L. D’Oliveira, S. El Rouayheb, X. Yang, H. Seferoglu, and Y. Chen, “Secure coded computation for efficient distributed learning in mobile IoT,” in Proc. 18th Annu. IEEE Int. Conf. Sens., Commun., Netw. (SECON), Jul. 2021, pp. 1–9.
[5]
T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, and W. Shi, “Federated learning of predictive models from federated electronic health records,” Int. J. Med. Inform., vol. 112, pp. 59–67, Apr. 2018.
[6]
J. C. Jiang, B. Kantarci, S. Oktug, and T. Soyata, “Federated learning in smart city sensing: Challenges and opportunities,” Sensors, vol. 20, no. 21, p. 6230, Oct. 2020.
[7]
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. 20th Int. Conf. Artif. Intell. Statist., A. Singh and J. Zhu, Eds. vol. 54, Apr. 2017, pp. 1273–1282. [Online]. Available: https://proceedings.mlr.press/v54/mcmahan17a.html
[8]
Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated learning with non-IID data,” 2018, arXiv:1806.00582.
[9]
J. Wang, Q. Liu, H. Liang, G. Joshi, and H. Vincent Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” 2020, arXiv:2007.07481.
[10]
K. Dolui and S. K. Datta, “Comparison of edge computing implementations: Fog computing, cloudlet and mobile edge computing,” in Proc. Global Internet Things Summit (GIoTS), Jun. 2017, pp. 1–6.
[11]
K. Hsiehet al., “Gaia: Geo-distributed machine learning approaching LAN speeds,” in Proc. NSDI, 2017, pp. 1–21.
[12]
S. Wanget al., “Adaptive federated learning in resource constrained edge computing systems,” IEEE J. Sel. Areas Commun., vol. 37, no. 3, pp. 1205–1221, Jun. 2019.
[13]
A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. Autom. Control, vol. 54, no. 1, pp. 48–61, Jan. 2009.
[14]
B. Johansson, M. Rabi, and M. Johansson, “A randomized incremental subgradient method for distributed optimization in networked systems,” SIAM J. Optim., vol. 20, no. 3, pp. 1157–1170, 2009.
[15]
P. Zhang, J. Wang, and K. Guo, “Compressive sensing and random walk based data collection in wireless sensor networks,” Comput. Commun., vol. 129, pp. 43–53, Sep. 2018.
[16]
X. Mao, K. Yuan, Y. Hu, Y. Gu, A. H. Sayed, and W. Yin, “Walkman: A communication-efficient random-walk algorithm for decentralized optimization,” IEEE Trans. Signal Process., vol. 68, pp. 2513–2528, 2020.
[17]
P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Mach. Learn., vol. 47, no. 2, pp. 235–256, 2002.
[18]
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The nonstochastic multiarmed bandit problem,” SIAM J. Comput., vol. 32, no. 1, pp. 48–77, 2011.
[19]
S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” 2012, arXiv:1204.5721.
[20]
B. Johansson, M. Rabi, and M. Johansson, “A simple peer-to-peer algorithm for distributed optimization in sensor networks,” in Proc. 46th IEEE Conf. Decis. Control, Dec. 2007, pp. 4705–4710.
[21]
S. Sundhar Ram, A. Nedic, and V. V. Veeravalli, “Asynchronous gossip algorithms for stochastic optimization,” in Proc. Int. Conf. Game Theory Netw., May 2009, pp. 80–81.
[22]
H.-T. Wai, W. Shi, A. Nedic, and A. Scaglione, “Curvature-aided incremental aggregated gradient method,” in Proc. 55th Annu. Allerton Conf. Commun., Control, Comput. (Allerton), Oct. 2017, pp. 526–532.
[23]
T. Sun, Y. Sun, and W. Yin, “On Markov chain gradient descent,” in Proc. NeurIPS, 2018, pp. 1–10.
[24]
G. Ayache and S. E. Rouayheb, “Private weighted random walk stochastic gradient descent,” IEEE J. Sel. Areas Inf. Theory, vol. 2, no. 1, pp. 452–463, Mar. 2021.
[25]
J. C. Duchi, A. Agarwal, M. Johansson, and M. I. Jordan, “Ergodic mirror descent,” in Proc. 49th Annu. Allerton Conf. Commun., Control, Comput. (Allerton), Sep. 2011, pp. 701–706.
[26]
W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm for decentralized consensus optimization,” J. Optim., vol. 25, no. 2, pp. 944–966, May 2015.
[27]
A. I. Chen, “Fast distributed first-order methods,” Tech. Rep., 2012.
[28]
D. Jakovetic, J. Xavier, and J. M. F. Moura, “Fast distributed gradient methods,” IEEE Trans. Autom. Control, vol. 59, no. 5, pp. 1131–1146, May 2014.
[29]
A. K. Sahu, T. Li, M. Sanjabi, M. Zaheer, A. S. Talwalkar, and V. Smith, “On the convergence of federated optimization in heterogeneous networks,” 2018, arXiv:1812.06127.
[30]
F. Haddadpour and M. Mahdavi, “On the convergence of local descent methods in federated learning,” 2019, arXiv:1910.14425.
[31]
S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” in Proc. 37th Int. Conf. Mach. Learn., 2020, pp. 5132–5143.
[32]
A. Khaled, K. Mishchenko, and P. Richtárik, “Tighter theory for local SGD on identical and heterogeneous data,” in Proc. AISTATS, 2020, pp. 1–10.
[33]
R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma, “Regret bounds for sleeping experts and bandits,” Mach. Learn., vol. 80, nos. 2–3, pp. 245–272, 2010.
[34]
H. Namkoong, A. Sinha, S. Yadlowsky, and J. C. Duchi, “Adaptive sampling probabilities for non-smooth optimization,” in Proc. ICML, 2017, pp. 1–10.
[35]
F. Salehi, L. Elisa Celis, and P. Thiran, “Stochastic optimization with bandit sampling,” 2017, arXiv:1708.02544.
[36]
Z. Borsos, A. Krause, and K. Y. Levy, “Online variance reduction for stochastic optimization,” 2018, arXiv:1802.04715.
[37]
L. Zhang, S. Lu, and Z. Zhou, “Adaptive online learning in dynamic environments,” in Proc. NeurIPS, 2018, pp. 1–11.
[38]
A. El Hanchi and D. A. Stephens, “Adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes,” 2021, arXiv:2103.12243.
[39]
V. Kanade, H. B. McMahan, and B. Bryan, “Sleeping experts and bandits with stochastic action availability and adversarial rewards,” in Proc. AISTATS, 2009, pp. 272–279.
[40]
A. Saha, P. Gaillard, and M. Valko, “Improved sleeping bandits with stochastic actions sets and adversarial rewards,” in Proc. ICML, 2020, pp. 1–9.
[41]
P. Zhao and T. Zhang, “Stochastic optimization with importance sampling,” 2014, arXiv:1401.2753.
[42]
G. Alain, A. Lamb, C. Sankar, A. Courville, and Y. Bengio, “Variance reduction in SGD by distributed importance sampling,” 2015, arXiv:1511.06481.
[43]
S. U. Stich, A. Raj, and M. Jaggi, “Safe adaptive importance sampling,” in Proc. NIPS, 2017, pp. 1–11.
[44]
G. Papa, P. Bianchi, and S. Clémençon, “Adaptive sampling for incremental optimization using stochastic gradient descent,” in Proc. Int. Conf. Algorithmic Learn. Theory, 2015, pp. 317–331.
[45]
D. Needell, R. Ward, and N. Srebro, “Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm,” in Advances in Neural Information Processing Systems, vol. 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Red Hook, NY, USA: Curran Associates, 2014, pp. 1017–1025.
[46]
L. Bottou and O. Bousquet, “The tradeoffs of large scale learning,” in Proc. NIPS, 2007, pp. 1–8.
[47]
A. Defazio, F. Bach, and S. Lacoste-Julien, “SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives,” in Proc. NIPS, 2014, pp. 1–9.
[48]
L. Xiao and T. Zhang, “A proximal stochastic gradient method with progressive variance reduction,” SIAM J. Optim., vol. 24, no. 4, pp. 2057–2075, 2014.
[49]
G. Bouchard, T. Trouillon, J. Perez, and A. Gaidon, “Online learning to sample,” 2015, arXiv:1506.09016.
[50]
T. Lattimore and C. Szepesvari, Bandit Algorithms, 2020.
[51]
C.-C. Huang, “Non-homogeneous Markov chains and their applications,” Ph.D. dissertations, Dept. Math., Lowa State Univ., Ames, IA, USA, 1977.
[52]
G. Ayache and S. El Rouayheb, “Random walk gradient descent for decentralized learning on graphs,” in Proc. IEEE Int. Parallel Distrib. Process. Symp. Workshops (IPDPSW), May 2019, pp. 926–931.
[53]
A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network structure, dynamics, and function using networkx,” in Proc. 7th Python Sci. Conf., G. Varoquaux, T. Vaught, and J. Millman, Eds. Pasadena, CA, USA, 2008, pp. 11–15.
[54]
Y. Nesterovet al., Lectures on Convex Optimization, vol. 137. Cham, Switzerland: Springer, 2018.
[55]
J. C. Dattorro, Convex Optimization and Euclidean Distance Geometry, 2004.
[56]
D. A. Levin, Y. Peres, and E. L. Wilmer, Markov Chains and Mixing Times. Providence, RI, USA: American Mathematical Society, 2006.

Cited By

View all
  • (2024)Energy-Efficient Hierarchical Collaborative Learning Over LEO Satellite ConstellationsIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.345902142:12(3366-3379)Online publication date: 30-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Journal on Selected Areas in Communications
IEEE Journal on Selected Areas in Communications  Volume 41, Issue 4
April 2023
414 pages

Publisher

IEEE Press

Publication History

Published: 01 April 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Energy-Efficient Hierarchical Collaborative Learning Over LEO Satellite ConstellationsIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.345902142:12(3366-3379)Online publication date: 30-Sep-2024

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media