More Web Proxy on the site http://driver.im/

research-article

Fairness-aware Model-agnostic Positive and Unlabeled Learning

Authors:

Jingrui HeAuthors Info & Claims

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

Pages 1698 - 1708

https://doi.org/10.1145/3531146.3533225

Published: 20 June 2022 Publication History

Abstract

With the increasing application of machine learning in high-stake decision-making problems, potential algorithmic bias towards people from certain social groups poses negative impacts on individuals and our society at large. In the real-world scenario, many such problems involve positive and unlabeled data such as medical diagnosis, criminal risk assessment and recommender systems. For instance, in medical diagnosis, only the diagnosed diseases will be recorded (positive) while others will not (unlabeled). Despite the large amount of existing work on fairness-aware machine learning in the (semi-)supervised and unsupervised settings, the fairness issue is largely under-explored in the aforementioned Positive and Unlabeled Learning (PUL) context, where it is usually more severe. In this paper, to alleviate this tension, we propose a fairness-aware PUL method named FairPUL. In particular, for binary classification over individuals from two populations, we aim to achieve similar true positive rates and false positive rates in both populations as our fairness metric. Based on the analysis of the optimal fair classifier for PUL, we design a model-agnostic post-processing framework, leveraging both the positive examples and unlabeled ones. Our framework is proven to be statistically consistent in terms of both the classification error and the fairness metric. Experiments on the synthetic and real-world data sets demonstrate that our framework outperforms state-of-the-art in both PUL and fair classification.

References

[1]

Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, and Hanna Wallach. 2018. A reductions approach to fair classification. In ICML. PMLR, 60–69.

[2]

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

[3]

Jean-Yves Audibert, Alexandre B Tsybakov, 2007. Fast learning rates for plug-in classifiers. The Annals of statistics 35, 2 (2007), 608–633.

[4]

Jessa Bekker and Jesse Davis. 2020. Learning from positive and unlabeled data: A survey. Machine Learning (2020), 719–760.

[5]

Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2018. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research(2018), 0049124118782533.

[6]

Sarah Bird, Miro Dudík, Richard Edgar, Brandon Horn, Roman Lutz, Vanessa Milan, Mehrnoosh Sameki, Hanna Wallach, and Kathleen Walker. 2020. Fairlearn: A toolkit for assessing and improving fairness in AI. Technical Report MSR-TR-2020-32. Microsoft. https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/

[7]

Danton S Char, Nigam H Shah, and David Magnus. 2018. Implementing machine learning in health care—addressing ethical challenges. The New England journal of medicine 378, 11 (2018), 981.

[8]

Xingyu Chen, Brandon Fain, Liang Lyu, and Kamesh Munagala. 2019. Proportionally fair clustering. In ICML. PMLR, 1032–1041.

[9]

Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair Clustering Through Fairlets. In NeurIPS. 5029–5037.

[10]

Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153–163.

[11]

Alexandra Chouldechova and Aaron Roth. 2018. The frontiers of fairness in machine learning. arXiv preprint arXiv:1810.08810(2018).

[12]

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Luca Oneto, and Massimiliano Pontil. 2019. Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification. In NeurIPS, Vol. 32.

[13]

Marc Claesen, Frank De Smet, Johan AK Suykens, and Bart De Moor. 2015. A robust ensemble approach to learn from positive and unlabeled data using SVM base models. Neurocomputing 160(2015), 73–84.

Digital Library

[14]

Christophe Denis and Mohamed Hebiri. 2020. Consistency of plug-in confidence sets for classification in semi-supervised learning. Journal of Nonparametric Statistics 32, 1 (2020), 42–72.

[15]

Luc Devroye. 1978. The uniform convergence of nearest neighbor regression function estimators and their application in optimization. IEEE Transactions on Information Theory 24, 2 (1978), 142–151.

Digital Library

[16]

Amit Dhurandhar and Karthik S Gurumoorthy. 2020. Classifier Invariant Approach to Learn from Positive-Unlabeled Data. In ICDM. IEEE, 102–111.

[17]

Michele Donini, Luca Oneto, Shai Ben-David, John S Shawe-Taylor, and Massimiliano Pontil. 2018. Empirical risk minimization under fairness constraints. In NeurIPS. 2791–2801.

[18]

Marthinus Christoffel du Plessis, Gang Niu, and Masashi Sugiyama. 2014. Analysis of Learning from Positive and Unlabeled Data. In NeurIPS.

[19]

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.

Digital Library

[20]

Cynthia Dwork, Christina Ilvento, and Meena Jagadeesan. 2020. Individual fairness in pipelines. arXiv preprint arXiv:2004.05167(2020).

[21]

Charles Elkan and Keith Noto. 2008. Learning classifiers from only positive and unlabeled data. In SIGKDD. 213–220.

[22]

Elaine Fehrman, Awaz K Muhammad, Evgeny M Mirkes, Vincent Egan, and Alexander N Gorban. 2017. The five factor model of personality and evaluation of drug consumption risk. In Data science. Springer, 231–242.

[23]

Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In SIGKDD. 259–268.

[24]

Chen Gong, Tongliang Liu, Jian Yang, and Dacheng Tao. 2019. Large-margin label-calibrated support vector machines for positive and unlabeled learning. IEEE transactions on neural networks and learning systems 30, 11(2019), 3471–3483.

[25]

Moritz Hardt, Eric Price, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In NeurIPS.

[26]

Fengxiang He, Tongliang Liu, Geoffrey I Webb, and Dacheng Tao. 2018. Instance-dependent pu learning by bayesian optimal relabeling. arXiv preprint arXiv:1808.02180(2018).

[27]

Cho-Jui Hsieh, Nagarajan Natarajan, and Inderjit Dhillon. 2015. PU learning for matrix completion. In ICML. PMLR, 2445–2453.

[28]

Dino Ienco and Ruggero G Pensa. 2016. Positive and unlabeled learning in categorical data. Neurocomputing 196(2016), 113–124.

Digital Library

[29]

Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan Stapleton, and Zhiwei Steven Wu. 2019. Eliciting and enforcing subjective individual fairness. arXiv preprint arXiv:1905.10660(2019).

[30]

Ryuichi Kiryo, Gang Niu, Marthinus Christoffel du Plessis, and Masashi Sugiyama. 2018. Positive-Unlabeled Learning with Non-Negative Risk Estimator. In NeurIPS.

[31]

Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. 2019. Fair k-center clustering for data summarization. In ICML. PMLR, 3448–3457.

[32]

Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual Fairness. In NeurIPS.

[33]

Wee Sun Lee and Bing Liu. 2003. Learning with positive and unlabeled examples using weighted logistic regression. In ICML, Vol. 3. 448–455.

[34]

Xiao-Li Li and Bing Liu. 2005. Learning from positive and unlabeled examples with different data distributions. In European conference on machine learning. Springer, 218–229.

Digital Library

[35]

M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

[36]

Aditya Krishna Menon and Robert C Williamson. 2018. The cost of fairness in binary classification. In FAccT. PMLR, 107–118.

[37]

Fantine Mordelet and J-P Vert. 2014. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognition Letters 37 (2014), 201–209.

Digital Library

[38]

Walt L Perry. 2013. Predictive policing: The role of crime forecasting in law enforcement operations. Rand Corporation.

[39]

David Pollard. 1990. Empirical processes: theory and applications. In NSF-CBMS regional conference series in probability and statistics. JSTOR, i–86.

[40]

Mauricio Sadinle, Jing Lei, and Larry Wasserman. 2019. Least ambiguous set-valued classifiers with bounded error levels. J. Amer. Statist. Assoc. 114, 525 (2019), 223–234.

[41]

Yuan-Hai Shao, Wei-Jie Chen, Li-Ming Liu, and Nai-Yang Deng. 2015. Laplacian unit-hyperplane learning from positive and unlabeled examples. Information Sciences 314(2015), 152–168.

Digital Library

[42]

Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, and Stefano Ermon. 2019. Learning controllable fair representations. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2164–2173.

[43]

Latanya Sweeney. 2013. Discrimination in online ad delivery. Commun. ACM 56, 5 (2013), 44–54.

Digital Library

[44]

Bhekisipho Twala. 2010. Multiple classifier application to credit risk assessment. Expert Systems with Applications 37, 4 (2010), 3326–3336.

Digital Library

[45]

Sara A Van de Geer 2008. High-dimensional generalized linear models and the lasso. Annals of Statistics 36, 2 (2008), 614–645.

[46]

Gill Ward, Trevor Hastie, Simon Barry, Jane Elith, and John R Leathwick. 2009. Presence-only data and the EM algorithm. Biometrics 65, 2 (2009), 554–563.

[47]

Yongkai Wu, Lu Zhang, Xintao Wu, and Hanghang Tong. 2019. PC-Fairness: A Unified Framework for Measuring Causality-based Fairness. In NeurIPS, Vol. 32.

[48]

Bowei Yan, Sanmi Koyejo, Kai Zhong, and Pradeep Ravikumar. 2018. Binary classification with karmic, threshold-quasi-concave metrics. In ICML. PMLR, 5531–5540.

[49]

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. 2017. Fairness constraints: Mechanisms for fair classification. In Artificial Intelligence and Statistics. PMLR, 962–970.

[50]

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In ICML. PMLR, 325–333.

[51]

Tao Zhang, Tianqing Zhu, Mengde Han, Jing Li, Wanlei Zhou, and Philip S Yu. 2020. Fairness Constraints in Semi-supervised Learning. arXiv preprint arXiv:2009.06190(2020).

Cited By

Hort MChen ZZhang JHarman MSarro F(2024)Bias Mitigation for Machine Learning Classifiers: A Comprehensive SurveyACM Journal on Responsible Computing10.1145/36313261:2(1-52)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3631326
Rabonato RBerton L(2024)A systematic review of fairness in machine learningAI and Ethics10.1007/s43681-024-00577-5Online publication date: 19-Sep-2024
https://doi.org/10.1007/s43681-024-00577-5
Wen XWang XGao CWang SLiu YGu Z(2023)When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00144(345-357)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00144
Show More Cited By

Index Terms

Fairness-aware Model-agnostic Positive and Unlabeled Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information systems applications
    1. Data mining

Index terms have been assigned to the content through auto-classification.

Recommendations

Multi-instance positive and unlabeled learning with bi-level embedding

Multiple Instance Learning (MIL) is a widely studied learning paradigm which arises from real applications. Existing MIL methods have achieved prominent performances under the premise of plenty annotation data. Nevertheless, sufficient labeled ...
Conditional generative positive and unlabeled learning
Abstract
The quantity of data generated increases daily, which makes it difficult to process. In the case of supervised learning, labeling training examples may represent an especially tedious and costly task. One of the aims of positive and ...
Highlights
- CGenPU is capable of conditional generation and binary classification.
- ...
A graph-based approach for positive and unlabeled learning
Highlights
- Proposal of a graph-based method for Positive and Unlabeled Learning that uses graph-based strategies in all steps.
Abstract
Positive and Unlabeled Learning (PUL) uses unlabeled documents and a few positive documents for retrieving a set of “interest” documents from a text collection. Usually, PUL approaches are based on the vector space model. However, when ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

June 2022

2351 pages

ISBN:9781450393522

DOI:10.1145/3531146

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

FAccT '22

Sponsor:

ACM

FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency

June 21 - 24, 2022

Seoul, Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
383
Total Downloads

Downloads (Last 12 months)90
Downloads (Last 6 weeks)21

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hort MChen ZZhang JHarman MSarro F(2024)Bias Mitigation for Machine Learning Classifiers: A Comprehensive SurveyACM Journal on Responsible Computing10.1145/36313261:2(1-52)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3631326
Rabonato RBerton L(2024)A systematic review of fairness in machine learningAI and Ethics10.1007/s43681-024-00577-5Online publication date: 19-Sep-2024
https://doi.org/10.1007/s43681-024-00577-5
Wen XWang XGao CWang SLiu YGu Z(2023)When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00144(345-357)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00144
Bertl MLamo YLeucker MMargaria TMohammadi EMukhiya SPechmann LPiho GRabbi F(2023)Challenges for AI in Healthcare SystemsBridging the Gap Between AI and Reality10.1007/978-3-031-73741-1_11(165-186)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1007/978-3-031-73741-1_11

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents