[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3461702.3462574acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data

Published: 30 July 2021 Publication History

Abstract

A learning algorithm A trained on a dataset D is revealed to have poor performance on some subpopulation at test time. Where should the responsibility for this lay? It can be argued that the data is responsible, if for example training A on a more representative dataset D' would have improved the performance. But it can similarly be argued that A itself is at fault, if training a different variant A' on the same dataset D would have improved performance. As ML becomes widespread and such failure cases more common, these types of questions are proving to be far from hypothetical. With this motivation in mind, in this work we provide a rigorous formulation of the joint credit assignment problem between a learning algorithm A and a dataset D. We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.

Supplementary Material

ZIP File (aiespp088aux.zip)
Files: 1. supp.pdf: a pdf containing supplementary material.

References

[1]
Anish Agarwal, Munther Dahleh, and Tuhin Sarkar. 2018. A marketplace for data: an algorithmic solution. arXiv preprint arXiv:1805.08125 (2018).
[2]
Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. 2019. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems 32 (2019), 15479--15488.
[3]
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77--91.
[4]
Javier Castro, Daniel Gómez, and Juan Tejada. 2009. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research 36, 5 (2009), 1726--1730.
[5]
Jianbo Chen, Le Song, Martin JWainwright, and Michael I Jordan. 2018. L-Shapley and C-Shapley: Efficient model interpretation for structured data. arXiv preprint arXiv:1808.02610 (2018).
[6]
Shay Cohen, Gideon Dror, and Eytan Ruppin. 2007. Feature selection via coalitional game theory. Neural Computation 19, 7 (2007), 1939--1961.
[7]
Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 598--617.
[8]
Shaheen S Fatima, Michael Wooldridge, and Nicholas R Jennings. 2008. A linear approximation method for the Shapley value. Artificial Intelligence 172, 14 (2008), 1673--1699.
[9]
Alexandre Fréchette, Lars Kotthoff, Tomasz Michalak, Talal Rahwan, Holger Hoos, and Kevin Leyton-Brown. 2016. Using the shapley value to analyze algorithm portfolios. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
[10]
Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681--3688.
[11]
Amirata Ghorbani, Michael P Kim, and James Zou. 2020. A Distributional Framework for Data Valuation. arXiv preprint arXiv:2002.12334 (2020).
[12]
Amirata Ghorbani and James Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In International Conference on Machine Learning. 2242--2251.
[13]
The Gradient. June 24, 2020. Lessons from the PULSE Model and Discussion. https://thegradient.pub/pulse-lessons/.
[14]
Faruk Gul. 1989. Bargaining foundations of Shapley value. Econometrica: Journal of the Econometric Society (1989), 81--95.
[15]
Herbert Hamers, Bart Husslage, R Lindelauf, Tjeerd Campen, et al. 2016. A New Approximation Method for the Shapley Value Applied to the WTC 9/11 Terrorist Attack. Technical Report.
[16]
Sara Hooker. 2021. Moving beyond "algorithmic bias is a data problem". Patterns 2, 4 (2021), 100241.
[17]
Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, and Andrea Frome. 2019. What Do Compressed Deep Neural Networks Forget? arXiv preprint arXiv:1911.05248 (2019).
[18]
Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).
[19]
Ruoxi Jia, David Dao, BoxinWang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, and Costas Spanos. 2019. Towards Efficient Data Valuation Based on the Shapley Value. arXiv preprint arXiv:1902.10275 (2019).
[20]
Michael P Kim, Amirata Ghorbani, and James Zou. 2019. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 247--254.
[21]
Igor Kononenko et al. 2010. An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research 11, Jan (2010), 1--18.
[22]
Lars Kotthoff, Alexandre Fréchette, Tomasz P Michalak, Talal Rahwan, Holger H Hoos, and Kevin Leyton-Brown. 2018. Quantifying Algorithmic Improvements over Time. In IJCAI. 5165--5171.
[23]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision. 3730--3738.
[24]
ScottMLundberg, Gabriel G Erion, and Su-In Lee. 2018. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv preprint arXiv:1802.03888 (2018).
[25]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774.
[26]
Sasan Maleki, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. 2013. Bounding the estimation error of sampling-based Shapley value approximation. arXiv preprint arXiv:1306.4265 (2013).
[27]
Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. 2020. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2437--2445.
[28]
Tomasz P Michalak, Karthik V Aadithya, Piotr L Szczepanski, Balaraman Ravindran, and Nicholas R Jennings. 2013. Efficient computation of the Shapley value for game-theoretic network centrality. Journal of Artificial Intelligence Research 46 (2013), 607--650.
[29]
John Willard Milnor and Lloyd S Shapley. 1978. Values of large games II: Oceanic games. Mathematics of operations research 3, 4 (1978), 290--307.
[30]
Art B Owen. 2014. Sobol'indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification 2, 1 (2014), 245--251.
[31]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[32]
Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307--317.
[33]
Lloyd S Shapley, Alvin E Roth, et al. 1988. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press.
[34]
Cathie Sudlow, John Gallacher, Naomi Allen, Valerie Beral, Paul Burton, John Danesh, Paul Downey, Paul Elliott, Jane Green, Martin Landray, et al. 2015. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine 12, 3 (2015), e1001779.
[35]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.
[36]
Venturebeat. June 26, 2020. AI Weekly: A deep learning pioneer's teachable moment on AI bias. https://venturebeat.com/2020/06/26/ai-weekly-a-deeplearning-pioneers-teachable-moment-on-ai-bias/.
[37]
Lior Wolf, Tal Hassner, and Yaniv Taigman. 2011. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE transactions on pattern analysis and machine intelligence 33, 10 (2011), 1978-- 1990.
[38]
Tom Yan and A. Procaccia. 2020. If You Like Shapley Then You'll Love the Core.
[39]
James Zou and Londa Schiebinger. 2018. AI can be sexist and racist-it's time to make it fair.

Cited By

View all
  • (2023)Shapley based residual decomposition for instance analysisProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619290(21375-21387)Online publication date: 23-Jul-2023
  • (2022)Failure Sources in Machine Learning for Medicine—A Study2022 IEEE 18th International Conference on e-Science (e-Science)10.1109/eScience55777.2022.00089(501-506)Online publication date: Oct-2022
  • (2022)Measuring Reproduciblity of Machine Learning Methods for Medical Diagnosis2022 Fourth International Conference on Transdisciplinary AI (TransAI)10.1109/TransAI54797.2022.00008(9-16)Online publication date: Sep-2022
  • Show More Cited By

Index Terms

  1. Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society
      July 2021
      1077 pages
      ISBN:9781450384735
      DOI:10.1145/3461702
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 July 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. accountability
      2. data valuation
      3. fairness
      4. machine learning

      Qualifiers

      • Research-article

      Funding Sources

      • ERC
      • NSF CAREER

      Conference

      AIES '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 61 of 162 submissions, 38%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)32
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 21 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Shapley based residual decomposition for instance analysisProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619290(21375-21387)Online publication date: 23-Jul-2023
      • (2022)Failure Sources in Machine Learning for Medicine—A Study2022 IEEE 18th International Conference on e-Science (e-Science)10.1109/eScience55777.2022.00089(501-506)Online publication date: Oct-2022
      • (2022)Measuring Reproduciblity of Machine Learning Methods for Medical Diagnosis2022 Fourth International Conference on Transdisciplinary AI (TransAI)10.1109/TransAI54797.2022.00008(9-16)Online publication date: Sep-2022
      • (2022)Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.01159(11882-11891)Online publication date: Jun-2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media