[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Contributions Estimation in Federated Learning: A Comprehensive Experimental Evaluation

Published: 31 May 2024 Publication History

Abstract

Federated Learning (FL) provides a privacy-preserving and decentralized approach to collaborative machine learning for multiple FL clients. The contribution estimation mechanism in FL is extensively studied within the database community, which aims to compute fair and reasonable contribution scores as incentives to motivate FL clients. However, designing such methods involves challenges in three aspects: effectiveness, robustness, and efficiency. Firstly, contribution estimation methods should utilize the data utility information of various client coalitions rather than that of individual clients to ensure effectiveness. Secondly, we should beware of adverse clients who may exploit tactics like data replication or label flipping. Thirdly, estimating contribution in FL can be time-consuming due to enumerating various client coalitions.
Despite numerous proposed methods to address these challenges, each possesses distinct advantages and limitations based on specific settings. However, existing methods have yet to be thoroughly evaluated and compared in the same experimental framework. Therefore, a unified and comprehensive evaluation framework is necessary to compare these methods under the same experimental settings. This paper conducts an extensive survey of contribution estimation methods in FL and introduces a comprehensive framework to evaluate their effectiveness, robustness, and efficiency. Through empirical results, we present extensive observations, valuable discoveries, and an adaptable testing framework that can facilitate future research in designing and evaluating contribution estimation methods in FL.

References

[1]
Naman Agarwal, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar, and Brendan McMahan. 2018. cpSGD: Communication-efficient and differentially-private distributed SGD. Advances in Neural Information Processing Systems 31 (2018).
[2]
David Aha. 1991. Tic-Tac-Toe Endgame. UCI Machine Learning Repository.
[3]
Barry Becker and Ronny Kohavi. 1996. Adult. UCI Machine Learning Repository.
[4]
Christopher M Bishop and Nasser M Nasrabadi. 2006. Pattern recognition and machine learning. Vol. 4. Springer.
[5]
Léon Bottou, Frank E Curtis, and Jorge Nocedal. 2018. Optimization methods for large-scale machine learning. SIAM review 60, 2 (2018), 223--311.
[6]
Rodica Branzei, Dinko Dimitrov, and Stef Tijs. 2008. Models in cooperative game theory. Vol. 556. Springer Science & Business Media.
[7]
Alessandro Di Bucchianico. 2014. Coefficient of Determination (R 2). Wiley StatsRef: Statistics Reference Online (2014).
[8]
Diogo V Carvalho, Eduardo M Pereira, and Jaime S Cardoso. 2019. Machine learning interpretability: A survey on methods and metrics. Electronics 8, 8 (2019), 832.
[9]
Tianfeng Chai and Roland R Draxler. 2014. Root mean square error (RMSE) or mean absolute error (MAE)?-Arguments against avoiding RMSE in the literature. Geoscientific model development 7, 3 (2014), 1247--1250.
[10]
Yiqiang Chen, Xiaodong Yang, Xin Qin, Han Yu, Piu Chan, and Zhiqi Shen. 2020. Dealing with label quality disparity in federated learning. Federated Learning: Privacy and Incentive (2020), 108--121.
[11]
Zicun Cong, Xuan Luo, Pei Jian, Feida Zhu, and Yong Zhang. 2021. Data Pricing in Machine Learning Pipelines. arXiv preprint arXiv:2108.07915 (2021).
[12]
Nidula Elgiriyewithana. 2023. Credit Card Fraud Detection Dataset 2023, Version 1. Kaggle. Retrieved February 11, 2024 from https://www.kaggle.com/datasets/nelgiriyewithana/credit-card-fraud-detection-dataset-2023/version/1
[13]
Zhenan Fan, Huang Fang, Zirui Zhou, Jian Pei, Michael P Friedlander, Changxin Liu, and Yong Zhang. 2021. Improving Fairness for Data Valuation in Horizontal Federated Learning. arXiv preprint arXiv:2109.09046 (2021).
[14]
Zhenan Fan, Huang Fang, Zirui Zhou, Jian Pei, Michael P Friedlander, and Yong Zhang. 2022. Fair and efficient contribution valuation for vertical federated learning. arXiv preprint arXiv:2201.02658 (2022).
[15]
Shaohan Feng, Dusit Niyato, Ping Wang, Dong In Kim, and Ying-Chang Liang. 2019. Joint service pricing and cooperative relay communication for federated learning. In 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). IEEE, 815--820.
[16]
Raul Castro Fernandez, Pranav Subramaniam, and Michael J. Franklin. 2020. Data Market Platforms: Trading Data Assets to Solve Data Problems. Proc. VLDB Endow. 13, 11 (2020), 1933--1947. http://www.vldb.org/pvldb/vol13/p1933-fernandez.pdf
[17]
Liang Gao, Huazhu Fu, Li Li, Yingwen Chen, Ming Xu, and Cheng-Zhong Xu. 2022. Feddc: Federated learning with non-iid data via local drift decoupling and correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10112--10121.
[18]
Amirata Ghorbani, Michael Kim, and James Zou. 2020. A distributional framework for data valuation. In International Conference on Machine Learning. PMLR, 3535--3544.
[19]
Amirata Ghorbani and James Zou. 2019. Data shapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning. PMLR, 2242--2251.
[20]
Bin Guo, Huihui Chen, Qi Han, Zhiwen Yu, Daqing Zhang, and Yu Wang. 2016. Worker-contributed data utility measurement for visual crowdsensing systems. IEEE Transactions on Mobile Computing 16, 8 (2016), 2379--2391.
[21]
Xiao Han, Leye Wang, and Junjie Wu. 2021. Data valuation for vertical federated learning: An information-theoretic approach. arXiv preprint arXiv:2112.08364 (2021).
[22]
Chao Huang, Ming Tang, Qian Ma, Jianwei Huang, and Xin Liu. 2023. Promoting Collaborations in Cross-Silo Federated Learning: Challenges and Opportunities. IEEE Communications Magazine (2023).
[23]
Jiyue Huang, Rania Talbi, Zilong Zhao, Sara Boucchenak, Lydia Y Chen, and Stefanie Roos. 2020. An exploratory analysis on users' contributions in federated learning. In 2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). IEEE, 20--29.
[24]
László A Jeni, Jeffrey F Cohn, and Fernando De La Torre. 2013. Facing imbalanced data-recommendations for the use of performance metrics. In 2013 Humaine association conference on affective computing and intelligent interaction. IEEE, 245--251.
[25]
Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li, Ce Zhang, Costas J Spanos, and Dawn Song. 2019. Efficient task-specific data valuation for nearest neighbor algorithms. arXiv preprint arXiv:1908.08619 (2019).
[26]
Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gürel, Bo Li, Ce Zhang, Dawn Song, and Costas J Spanos. 2019. Towards efficient data valuation based on the shapley value. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1167--1176.
[27]
Ruoxi Jia, Fan Wu, Xuehui Sun, Jiacen Xu, David Dao, Bhavya Kailkhura, Ce Zhang, Bo Li, and Dawn Song. 2021. Scalability vs. utility: Do we have to sacrifice one for the other in data importance quantification?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8239--8247.
[28]
Noah Johnson, Joseph P Near, and Dawn Song. 2018. Towards practical differential privacy for SQL queries. Proceedings of the VLDB Endowment 11, 5 (2018), 526--539.
[29]
Kaggle [n.d.]. Kaggle: Your Machine Learning and Data Science Community. Retrieved February 14, 2024 from https://www.kaggle.com
[30]
Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2021. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14, 1--2 (2021), 1--210.
[31]
Jiawen Kang, Zehui Xiong, Dusit Niyato, Shengli Xie, and Junshan Zhang. 2019. Incentive mechanism for reliable federated learning: A joint optimization approach to combining reputation and contract theory. IEEE Internet of Things Journal 6, 6 (2019), 10700--10714.
[32]
Michael Kearns and Dana Ron. 1999. Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation. Neural Computation 11, 6 (1999), 1427--1453.
[33]
Markelle Kelly, Rachel Longjohn, and Kolby Nottingham. [n.d.]. The UCI Machine Learning Repository. Retrieved Oct 1, 2023 from https://archive.ics.uci.edu
[34]
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International conference on machine learning. PMLR, 1885--1894.
[35]
Ron Kohavi et al. 1996. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In Kdd, Vol. 96. 202--207.
[36]
Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).
[37]
Yongchan Kwon, Manuel A Rivas, and James Zou. 2021. Efficient computation and analysis of distributional shapley values. In International Conference on Artificial Intelligence and Statistics. PMLR, 793--801.
[38]
Junqing Le, Di Zhang, Xinyu Lei, Long Jiao, Kai Zeng, and Xiaofeng Liao. 2023. Privacy-Preserving Federated Learning With Malicious Clients and Honest-but-Curious Servers. IEEE Trans. Inf. Forensics Secur. 18 (2023), 4329--4344.
[39]
Qinbin Li, Yiqun Diao, Quan Chen, and Bingsheng He. 2022. Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 965--978.
[40]
Suyi Li, Yong Cheng, Yang Liu, Wei Wang, and Tianjian Chen. 2019. Abnormal Client Behavior Detection in Federated Learning. CoRR abs/1910.09933 (2019). arXiv:1910.09933 http://arxiv.org/abs/1910.09933
[41]
Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine 37, 3 (2020), 50--60.
[42]
Zelei Liu, Yuanyuan Chen, Han Yu, Yang Liu, and Lizhen Cui. 2021. GTG-Shapley: Efficient and Accurate Participant Contribution Evaluation in Federated Learning. arXiv preprint arXiv:2109.02053 (2021).
[43]
Xuan Luo, Jian Pei, Zicun Cong, and Cheng Xu. 2022. On shapley value in data assemblage under independent utility. arXiv preprint arXiv:2208.01163 (2022).
[44]
Hongtao Lv, Zhenzhe Zheng, Tie Luo, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, and Chengfei Lv. 2021. Data-Free Evaluation of User Contributions in Federated Learning. In 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt). 1--8.
[45]
Lingjuan Lyu, Xinyi Xu, Qian Wang, and Han Yu. 2020. Collaborative fairness in federated learning. Federated Learning: Privacy and Incentive (2020), 189--204.
[46]
Shuaicheng Ma, Yang Cao, and Li Xiong. 2021. Transparent contribution evaluation for secure federated learning on blockchain. In 2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW). IEEE, 88--91.
[47]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273--1282.
[48]
Sérgio Moro, Paulo Cortez, and Paulo Rita. 2014. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems 62 (2014), 22--31.
[49]
Sergio Moro, Raul Laureano, and Paulo Cortez. 2011. Using data mining for bank direct marketing: An application of the crisp-dm methodology. (2011).
[50]
S. Moro, P. Rita, and P. Cortez. 2012. Bank Marketing. UCI Machine Learning Repository.
[51]
Tri-Dung Nguyen. 2015. The fairest core in cooperative games with transferable utilities. Operations Research Letters 43, 1 (2015), 34--39.
[52]
Shashi Raj Pandey, Nguyen H Tran, Mehdi Bennis, Yan Kyaw Tun, Aunas Manzoor, and Choong Seon Hong. 2020. A crowdsourcing framework for on-device federated learning. IEEE Transactions on Wireless Communications 19, 5 (2020), 3241--3256.
[53]
Jian Pei. 2020. A survey on data pricing: from economics to data science. IEEE Transactions on knowledge and Data Engineering 34, 10 (2020), 4586--4608.
[54]
Bezalel Peleg and Peter Sudhölter. 2007. Introduction to the theory of cooperative games. Vol. 34. Springer Science & Business Media.
[55]
Adam Richardson, Aris Filos-Ratsikas, and Boi Faltings. 2019. Rewarding high-quality data via influence functions. arXiv preprint arXiv:1908.11598 (2019).
[56]
Alvin E Roth. 1988. Introduction to the Shapley value. The Shapley value (1988), 1--27.
[57]
Benedek Rozemberczki, Lauren Watson, Péter Bayer, Hao-Tsung Yang, Olivér Kiss, Sebastian Nilsson, and Rik Sarkar. 2022. The shapley value in machine learning. arXiv preprint arXiv:2202.05594 (2022).
[58]
David Schmeidler. 1969. The nucleolus of a characteristic function game. SIAM Journal on applied mathematics 17, 6 (1969), 1163--1170.
[59]
Lloyd S Shapley et al. 1953. A value for n-person games. (1953).
[60]
Yuxin Shi, Han Yu, and Cyril Leung. 2023. Towards fairness-aware federated learning. IEEE Transactions on Neural Networks and Learning Systems (2023).
[61]
Sung Kuk Shyn, Donghee Kim, and Kwangsu Kim. 2021. Fedccea: A practical approach of client contribution evaluation for federated learning. arXiv preprint arXiv:2106.02310 (2021).
[62]
Rachael Hwee Ling Sim, Yehong Zhang, Mun Choon Chan, and Bryan Kian Hsiang Low. 2020. Collaborative machine learning with incentive-aware model rewards. In International conference on machine learning. PMLR, 8927--8936.
[63]
Tamás Solymosi and TES Raghavan. 2001. Assignment games with stable core. International Journal of Game Theory 30 (2001), 177--185.
[64]
Tianshu Song, Yongxin Tong, and Shuyue Wei. 2019. Profit allocation for federated learning. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2577--2586.
[65]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958.
[66]
Qiheng Sun, Xiang Li, Jiayao Zhang, Li Xiong, Weiran Liu, Jinfei Liu, Zhan Qin, and Kui Ren. 2023. Shapleyfl: Robust federated learning based on shapley value. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2096--2108.
[67]
Stephen Tridgell. 2016. Dota2 Games Results. UCI Machine Learning Repository.
[68]
Tjeerd van Campen, Herbert Hamers, Bart Husslage, and Roy Lindelauf. 2018. A new approximation method for the Shapley value applied to the WTC 9/11 terrorist attack. Social Network Analysis and Mining 8 (2018), 1--12.
[69]
Guan Wang, Charlie Xiaoqian Dang, and Ziye Zhou. 2019. Measure contribution of participants in federated learning. In 2019 IEEE international conference on big data (Big Data). IEEE, 2597--2604.
[70]
Tianhao Wang, Johannes Rausch, Ce Zhang, Ruoxi Jia, and Dawn Song. 2020. A principled approach to data valuation for federated learning. Federated Learning: Privacy and Incentive (2020), 153--167.
[71]
Yong Wang, Kaiyu Li, Guoliang Li, Yunyan Guo, and Zhuo Wan. 2024. Fast, Robust and Interpretable Participant Contribution Estimation for Federated Learning. In ICDE.
[72]
Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. 2016. A survey of transfer learning. Journal of Big data 3, 1 (2016), 1--40.
[73]
Cort J Willmott and Kenji Matsuura. 2005. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research 30, 1 (2005), 79--82.
[74]
Margaret Wright. 2005. The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bulletin of the American mathematical society 42, 1 (2005), 39--56.
[75]
Haocheng Xia, Jinfei Liu, Jian Lou, Zhan Qin, Kui Ren, Yang Cao, and Li Xiong. 2023. Equitable Data Valuation Meets the Right to Be Forgotten in Model Markets. Proceedings of the VLDB Endowment 16, 11 (2023), 3349--3362.
[76]
Xinyi Xu, Lingjuan Lyu, Xingjun Ma, Chenglin Miao, Chuan Sheng Foo, and Bryan Kian Hsiang Low. 2021. Gradient driven rewards to guarantee fairness in collaborative machine learning. Advances in Neural Information Processing Systems 34 (2021), 16104--16117.
[77]
Xinyi Xu, Zhaoxuan Wu, Chuan Sheng Foo, and Bryan Kian Hsiang Low. 2021. Validation free and replication robust volume-based data valuation. Advances in Neural Information Processing Systems 34 (2021), 10837--10848.
[78]
Tom Yan and Ariel D Procaccia. 2021. If you like shapley then you'll love the core. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 5751--5759.
[79]
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1--19.
[80]
Jinsung Yoon, Sercan Arik, and Tomas Pfister. 2020. Data valuation using reinforcement learning. In International Conference on Machine Learning. PMLR, 10842--10851.
[81]
Rongfei Zeng, Chao Zeng, Xingwei Wang, Bo Li, and Xiaowen Chu. 2021. A comprehensive survey of incentive mechanism for federated learning. arXiv preprint arXiv:2106.15406 (2021).
[82]
Yufeng Zhan, Peng Li, Kun Wang, Song Guo, and Yuanqing Xia. 2020. Big data analytics by crowdlearning: Architecture and mechanism design. IEEE Network 34, 3 (2020), 143--147.
[83]
Mengxiao Zhang and Fernando Beltrán. 2020. A survey of data pricing methods. Available at SSRN 3609120 (2020).
[84]
Bowen Zhao, Ximeng Liu, and Wei-neng Chen. 2021. When crowdsensing meets federated learning: Privacy-preserving mobile crowdsensing system. arXiv preprint arXiv:2102.10109 (2021).
[85]
Jie Zhao, Xinghua Zhu, Jianzong Wang, and Jing Xiao. 2021. Efficient client contribution evaluation for horizontal federated learning. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3060--3064.
[86]
Shuyuan Zheng, Yang Cao, and Masatoshi Yoshikawa. 2023. Secure Shapley Value for Cross-Silo Federated Learning. Proceedings of the VLDB Endowment 16, 7 (2023), 1657--1670.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 17, Issue 8
April 2024
335 pages
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 31 May 2024
Published in PVLDB Volume 17, Issue 8

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 102
    Total Downloads
  • Downloads (Last 12 months)102
  • Downloads (Last 6 weeks)22
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media