[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3531146.3533225acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article

Fairness-aware Model-agnostic Positive and Unlabeled Learning

Published: 20 June 2022 Publication History

Abstract

With the increasing application of machine learning in high-stake decision-making problems, potential algorithmic bias towards people from certain social groups poses negative impacts on individuals and our society at large. In the real-world scenario, many such problems involve positive and unlabeled data such as medical diagnosis, criminal risk assessment and recommender systems. For instance, in medical diagnosis, only the diagnosed diseases will be recorded (positive) while others will not (unlabeled). Despite the large amount of existing work on fairness-aware machine learning in the (semi-)supervised and unsupervised settings, the fairness issue is largely under-explored in the aforementioned Positive and Unlabeled Learning (PUL) context, where it is usually more severe. In this paper, to alleviate this tension, we propose a fairness-aware PUL method named FairPUL. In particular, for binary classification over individuals from two populations, we aim to achieve similar true positive rates and false positive rates in both populations as our fairness metric. Based on the analysis of the optimal fair classifier for PUL, we design a model-agnostic post-processing framework, leveraging both the positive examples and unlabeled ones. Our framework is proven to be statistically consistent in terms of both the classification error and the fairness metric. Experiments on the synthetic and real-world data sets demonstrate that our framework outperforms state-of-the-art in both PUL and fair classification.

References

[1]
Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, and Hanna Wallach. 2018. A reductions approach to fair classification. In ICML. PMLR, 60–69.
[2]
Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
[3]
Jean-Yves Audibert, Alexandre B Tsybakov, 2007. Fast learning rates for plug-in classifiers. The Annals of statistics 35, 2 (2007), 608–633.
[4]
Jessa Bekker and Jesse Davis. 2020. Learning from positive and unlabeled data: A survey. Machine Learning (2020), 719–760.
[5]
Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2018. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research(2018), 0049124118782533.
[6]
Sarah Bird, Miro Dudík, Richard Edgar, Brandon Horn, Roman Lutz, Vanessa Milan, Mehrnoosh Sameki, Hanna Wallach, and Kathleen Walker. 2020. Fairlearn: A toolkit for assessing and improving fairness in AI. Technical Report MSR-TR-2020-32. Microsoft. https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/
[7]
Danton S Char, Nigam H Shah, and David Magnus. 2018. Implementing machine learning in health care—addressing ethical challenges. The New England journal of medicine 378, 11 (2018), 981.
[8]
Xingyu Chen, Brandon Fain, Liang Lyu, and Kamesh Munagala. 2019. Proportionally fair clustering. In ICML. PMLR, 1032–1041.
[9]
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair Clustering Through Fairlets. In NeurIPS. 5029–5037.
[10]
Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153–163.
[11]
Alexandra Chouldechova and Aaron Roth. 2018. The frontiers of fairness in machine learning. arXiv preprint arXiv:1810.08810(2018).
[12]
Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Luca Oneto, and Massimiliano Pontil. 2019. Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification. In NeurIPS, Vol. 32.
[13]
Marc Claesen, Frank De Smet, Johan AK Suykens, and Bart De Moor. 2015. A robust ensemble approach to learn from positive and unlabeled data using SVM base models. Neurocomputing 160(2015), 73–84.
[14]
Christophe Denis and Mohamed Hebiri. 2020. Consistency of plug-in confidence sets for classification in semi-supervised learning. Journal of Nonparametric Statistics 32, 1 (2020), 42–72.
[15]
Luc Devroye. 1978. The uniform convergence of nearest neighbor regression function estimators and their application in optimization. IEEE Transactions on Information Theory 24, 2 (1978), 142–151.
[16]
Amit Dhurandhar and Karthik S Gurumoorthy. 2020. Classifier Invariant Approach to Learn from Positive-Unlabeled Data. In ICDM. IEEE, 102–111.
[17]
Michele Donini, Luca Oneto, Shai Ben-David, John S Shawe-Taylor, and Massimiliano Pontil. 2018. Empirical risk minimization under fairness constraints. In NeurIPS. 2791–2801.
[18]
Marthinus Christoffel du Plessis, Gang Niu, and Masashi Sugiyama. 2014. Analysis of Learning from Positive and Unlabeled Data. In NeurIPS.
[19]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.
[20]
Cynthia Dwork, Christina Ilvento, and Meena Jagadeesan. 2020. Individual fairness in pipelines. arXiv preprint arXiv:2004.05167(2020).
[21]
Charles Elkan and Keith Noto. 2008. Learning classifiers from only positive and unlabeled data. In SIGKDD. 213–220.
[22]
Elaine Fehrman, Awaz K Muhammad, Evgeny M Mirkes, Vincent Egan, and Alexander N Gorban. 2017. The five factor model of personality and evaluation of drug consumption risk. In Data science. Springer, 231–242.
[23]
Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In SIGKDD. 259–268.
[24]
Chen Gong, Tongliang Liu, Jian Yang, and Dacheng Tao. 2019. Large-margin label-calibrated support vector machines for positive and unlabeled learning. IEEE transactions on neural networks and learning systems 30, 11(2019), 3471–3483.
[25]
Moritz Hardt, Eric Price, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In NeurIPS.
[26]
Fengxiang He, Tongliang Liu, Geoffrey I Webb, and Dacheng Tao. 2018. Instance-dependent pu learning by bayesian optimal relabeling. arXiv preprint arXiv:1808.02180(2018).
[27]
Cho-Jui Hsieh, Nagarajan Natarajan, and Inderjit Dhillon. 2015. PU learning for matrix completion. In ICML. PMLR, 2445–2453.
[28]
Dino Ienco and Ruggero G Pensa. 2016. Positive and unlabeled learning in categorical data. Neurocomputing 196(2016), 113–124.
[29]
Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan Stapleton, and Zhiwei Steven Wu. 2019. Eliciting and enforcing subjective individual fairness. arXiv preprint arXiv:1905.10660(2019).
[30]
Ryuichi Kiryo, Gang Niu, Marthinus Christoffel du Plessis, and Masashi Sugiyama. 2018. Positive-Unlabeled Learning with Non-Negative Risk Estimator. In NeurIPS.
[31]
Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. 2019. Fair k-center clustering for data summarization. In ICML. PMLR, 3448–3457.
[32]
Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual Fairness. In NeurIPS.
[33]
Wee Sun Lee and Bing Liu. 2003. Learning with positive and unlabeled examples using weighted logistic regression. In ICML, Vol. 3. 448–455.
[34]
Xiao-Li Li and Bing Liu. 2005. Learning from positive and unlabeled examples with different data distributions. In European conference on machine learning. Springer, 218–229.
[35]
M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[36]
Aditya Krishna Menon and Robert C Williamson. 2018. The cost of fairness in binary classification. In FAccT. PMLR, 107–118.
[37]
Fantine Mordelet and J-P Vert. 2014. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognition Letters 37 (2014), 201–209.
[38]
Walt L Perry. 2013. Predictive policing: The role of crime forecasting in law enforcement operations. Rand Corporation.
[39]
David Pollard. 1990. Empirical processes: theory and applications. In NSF-CBMS regional conference series in probability and statistics. JSTOR, i–86.
[40]
Mauricio Sadinle, Jing Lei, and Larry Wasserman. 2019. Least ambiguous set-valued classifiers with bounded error levels. J. Amer. Statist. Assoc. 114, 525 (2019), 223–234.
[41]
Yuan-Hai Shao, Wei-Jie Chen, Li-Ming Liu, and Nai-Yang Deng. 2015. Laplacian unit-hyperplane learning from positive and unlabeled examples. Information Sciences 314(2015), 152–168.
[42]
Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, and Stefano Ermon. 2019. Learning controllable fair representations. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2164–2173.
[43]
Latanya Sweeney. 2013. Discrimination in online ad delivery. Commun. ACM 56, 5 (2013), 44–54.
[44]
Bhekisipho Twala. 2010. Multiple classifier application to credit risk assessment. Expert Systems with Applications 37, 4 (2010), 3326–3336.
[45]
Sara A Van de Geer 2008. High-dimensional generalized linear models and the lasso. Annals of Statistics 36, 2 (2008), 614–645.
[46]
Gill Ward, Trevor Hastie, Simon Barry, Jane Elith, and John R Leathwick. 2009. Presence-only data and the EM algorithm. Biometrics 65, 2 (2009), 554–563.
[47]
Yongkai Wu, Lu Zhang, Xintao Wu, and Hanghang Tong. 2019. PC-Fairness: A Unified Framework for Measuring Causality-based Fairness. In NeurIPS, Vol. 32.
[48]
Bowei Yan, Sanmi Koyejo, Kai Zhong, and Pradeep Ravikumar. 2018. Binary classification with karmic, threshold-quasi-concave metrics. In ICML. PMLR, 5531–5540.
[49]
Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. 2017. Fairness constraints: Mechanisms for fair classification. In Artificial Intelligence and Statistics. PMLR, 962–970.
[50]
Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In ICML. PMLR, 325–333.
[51]
Tao Zhang, Tianqing Zhu, Mengde Han, Jing Li, Wanlei Zhou, and Philip S Yu. 2020. Fairness Constraints in Semi-supervised Learning. arXiv preprint arXiv:2009.06190(2020).

Cited By

View all
  • (2024)Bias Mitigation for Machine Learning Classifiers: A Comprehensive SurveyACM Journal on Responsible Computing10.1145/36313261:2(1-52)Online publication date: 20-Jun-2024
  • (2024)A systematic review of fairness in machine learningAI and Ethics10.1007/s43681-024-00577-5Online publication date: 19-Sep-2024
  • (2023)When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00144(345-357)Online publication date: 11-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
June 2022
2351 pages
ISBN:9781450393522
DOI:10.1145/3531146
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Fairness
  2. Machine Learning
  3. Positive and Unlabeled Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

FAccT '22
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)21
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Bias Mitigation for Machine Learning Classifiers: A Comprehensive SurveyACM Journal on Responsible Computing10.1145/36313261:2(1-52)Online publication date: 20-Jun-2024
  • (2024)A systematic review of fairness in machine learningAI and Ethics10.1007/s43681-024-00577-5Online publication date: 19-Sep-2024
  • (2023)When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00144(345-357)Online publication date: 11-Sep-2023
  • (2023)Challenges for AI in Healthcare SystemsBridging the Gap Between AI and Reality10.1007/978-3-031-73741-1_11(165-186)Online publication date: 23-Oct-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media