More Web Proxy on the site http://driver.im/

research-article

Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data

Authors:

Amirata Ghorbani,

James ZouAuthors Info & Claims

AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pages 1034 - 1041

https://doi.org/10.1145/3461702.3462574

Published: 30 July 2021 Publication History

Abstract

A learning algorithm A trained on a dataset D is revealed to have poor performance on some subpopulation at test time. Where should the responsibility for this lay? It can be argued that the data is responsible, if for example training A on a more representative dataset D' would have improved the performance. But it can similarly be argued that A itself is at fault, if training a different variant A' on the same dataset D would have improved performance. As ML becomes widespread and such failure cases more common, these types of questions are proving to be far from hypothetical. With this motivation in mind, in this work we provide a rigorous formulation of the joint credit assignment problem between a learning algorithm A and a dataset D. We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.

Supplementary Material

ZIP File (aiespp088aux.zip)

Files: 1. supp.pdf: a pdf containing supplementary material.

Download
1.04 MB

References

[1]

Anish Agarwal, Munther Dahleh, and Tuhin Sarkar. 2018. A marketplace for data: an algorithmic solution. arXiv preprint arXiv:1805.08125 (2018).

[2]

Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. 2019. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems 32 (2019), 15479--15488.

[3]

Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77--91.

[4]

Javier Castro, Daniel Gómez, and Juan Tejada. 2009. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research 36, 5 (2009), 1726--1730.

Digital Library

[5]

Jianbo Chen, Le Song, Martin JWainwright, and Michael I Jordan. 2018. L-Shapley and C-Shapley: Efficient model interpretation for structured data. arXiv preprint arXiv:1808.02610 (2018).

[6]

Shay Cohen, Gideon Dror, and Eytan Ruppin. 2007. Feature selection via coalitional game theory. Neural Computation 19, 7 (2007), 1939--1961.

Digital Library

[7]

Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 598--617.

[8]

Shaheen S Fatima, Michael Wooldridge, and Nicholas R Jennings. 2008. A linear approximation method for the Shapley value. Artificial Intelligence 172, 14 (2008), 1673--1699.

Digital Library

[9]

Alexandre Fréchette, Lars Kotthoff, Tomasz Michalak, Talal Rahwan, Holger Hoos, and Kevin Leyton-Brown. 2016. Using the shapley value to analyze algorithm portfolios. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.

[10]

Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681--3688.

Digital Library

[11]

Amirata Ghorbani, Michael P Kim, and James Zou. 2020. A Distributional Framework for Data Valuation. arXiv preprint arXiv:2002.12334 (2020).

[12]

Amirata Ghorbani and James Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In International Conference on Machine Learning. 2242--2251.

[13]

The Gradient. June 24, 2020. Lessons from the PULSE Model and Discussion. https://thegradient.pub/pulse-lessons/.

[14]

Faruk Gul. 1989. Bargaining foundations of Shapley value. Econometrica: Journal of the Econometric Society (1989), 81--95.

[15]

Herbert Hamers, Bart Husslage, R Lindelauf, Tjeerd Campen, et al. 2016. A New Approximation Method for the Shapley Value Applied to the WTC 9/11 Terrorist Attack. Technical Report.

[16]

Sara Hooker. 2021. Moving beyond "algorithmic bias is a data problem". Patterns 2, 4 (2021), 100241.

[17]

Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, and Andrea Frome. 2019. What Do Compressed Deep Neural Networks Forget? arXiv preprint arXiv:1911.05248 (2019).

[18]

Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).

[19]

Ruoxi Jia, David Dao, BoxinWang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, and Costas Spanos. 2019. Towards Efficient Data Valuation Based on the Shapley Value. arXiv preprint arXiv:1902.10275 (2019).

[20]

Michael P Kim, Amirata Ghorbani, and James Zou. 2019. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 247--254.

Digital Library

[21]

Igor Kononenko et al. 2010. An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research 11, Jan (2010), 1--18.

[22]

Lars Kotthoff, Alexandre Fréchette, Tomasz P Michalak, Talal Rahwan, Holger H Hoos, and Kevin Leyton-Brown. 2018. Quantifying Algorithmic Improvements over Time. In IJCAI. 5165--5171.

[23]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision. 3730--3738.

Digital Library

[24]

ScottMLundberg, Gabriel G Erion, and Su-In Lee. 2018. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv preprint arXiv:1802.03888 (2018).

[25]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774.

[26]

Sasan Maleki, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. 2013. Bounding the estimation error of sampling-based Shapley value approximation. arXiv preprint arXiv:1306.4265 (2013).

[27]

Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. 2020. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2437--2445.

[28]

Tomasz P Michalak, Karthik V Aadithya, Piotr L Szczepanski, Balaraman Ravindran, and Nicholas R Jennings. 2013. Efficient computation of the Shapley value for game-theoretic network centrality. Journal of Artificial Intelligence Research 46 (2013), 607--650.

[29]

John Willard Milnor and Lloyd S Shapley. 1978. Values of large games II: Oceanic games. Mathematics of operations research 3, 4 (1978), 290--307.

[30]

Art B Owen. 2014. Sobol'indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification 2, 1 (2014), 245--251.

[31]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.

Digital Library

[32]

Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307--317.

[33]

Lloyd S Shapley, Alvin E Roth, et al. 1988. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press.

[34]

Cathie Sudlow, John Gallacher, Naomi Allen, Valerie Beral, Paul Burton, John Danesh, Paul Downey, Paul Elliott, Jane Green, Martin Landray, et al. 2015. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine 12, 3 (2015), e1001779.

[35]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.

Digital Library

[36]

Venturebeat. June 26, 2020. AI Weekly: A deep learning pioneer's teachable moment on AI bias. https://venturebeat.com/2020/06/26/ai-weekly-a-deeplearning-pioneers-teachable-moment-on-ai-bias/.

[37]

Lior Wolf, Tal Hassner, and Yaniv Taigman. 2011. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE transactions on pattern analysis and machine intelligence 33, 10 (2011), 1978-- 1990.

Digital Library

[38]

Tom Yan and A. Procaccia. 2020. If You Like Shapley Then You'll Love the Core.

[39]

James Zou and Londa Schiebinger. 2018. AI can be sexist and racist-it's time to make it fair.

Cited By

Liu TBarnard AKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Shapley based residual decomposition for instance analysisProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619290(21375-21387)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619290
Ahmed HTchoua RLofstead J(2022)Failure Sources in Machine Learning for Medicine—A Study2022 IEEE 18th International Conference on e-Science (e-Science)10.1109/eScience55777.2022.00089(501-506)Online publication date: Oct-2022
https://doi.org/10.1109/eScience55777.2022.00089
Ahmed HTchoua RLofstead J(2022)Measuring Reproduciblity of Machine Learning Methods for Medical Diagnosis2022 Fourth International Conference on Transdisciplinary AI (TransAI)10.1109/TransAI54797.2022.00008(9-16)Online publication date: Sep-2022
https://doi.org/10.1109/TransAI54797.2022.00008
Show More Cited By

Index Terms

Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Social aspects of security and privacy

Recommendations

What's fair is… fair? Presenting JustEFAB, an ethical framework for operationalizing medical ethics and social justice in the integration of clinical machine learning: JustEFAB
FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency

The problem of algorithmic bias represents an ethical threat to the fair treatment of patients when their care involves machine learning (ML) models informing clinical decision-making. The design, development, testing, and integration of ML models ...
Measuring justice in machine learning
FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency

How can we build more just machine learning systems? To answer this question, we need to know both what justice is and how to tell whether one system is more or less just than another. That is, we need both a definition and a measure of justice. ...
Towards Responsible Spatial Data Science and Geo-AI
IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing

Responsible Geo-AI encourages the design and development of spatial methods, processes, algorithms, and systems to discover spatial patterns (e.g., hotspots, colocations) that reduce adverse impacts on the communities that use them. We propose a vision ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

July 2021

1077 pages

ISBN:9781450384735

DOI:10.1145/3461702

Program Chairs:
Marion Fourcade
University of California Berkeley, USA
,
Benjamin Kuipers
University of Michigan, USA
,
Seth Lazar
Australian National University, Australia
,
Deirdre Mulligan
University of California Berkeley, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ERC
NSF CAREER

Conference

AIES '21

Sponsor:

SIGAI

AIES '21: AAAI/ACM Conference on AI, Ethics, and Society

May 19 - 21, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 61 of 162 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
273
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu TBarnard AKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Shapley based residual decomposition for instance analysisProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619290(21375-21387)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619290
Ahmed HTchoua RLofstead J(2022)Failure Sources in Machine Learning for Medicine—A Study2022 IEEE 18th International Conference on e-Science (e-Science)10.1109/eScience55777.2022.00089(501-506)Online publication date: Oct-2022
https://doi.org/10.1109/eScience55777.2022.00089
Ahmed HTchoua RLofstead J(2022)Measuring Reproduciblity of Machine Learning Methods for Medical Diagnosis2022 Fourth International Conference on Transdisciplinary AI (TransAI)10.1109/TransAI54797.2022.00008(9-16)Online publication date: Sep-2022
https://doi.org/10.1109/TransAI54797.2022.00008
Xiao HWang ZZhu ZZhou JLu J(2022)Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.01159(11882-11891)Online publication date: Jun-2022
https://doi.org/10.1109/CVPR52688.2022.01159

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents