More Web Proxy on the site http://driver.im/

research-article

Long-Term Resource Allocation Fairness in Average Markov Decision Process (AMDP) Environment

Authors:

Vishakha Patil,

Yilun ZhouAuthors Info & Claims

AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Pages 525 - 533

Published: 09 May 2022 Publication History

Abstract

Fairness has emerged as an important concern in automated decision-making in recent years, especially when these decisions affect human welfare. In this work, we study fairness in temporally extended decision-making settings, specifically those formulated as Markov Decision Processes (MDPs). Our proposed notion of fairness ensures that each state's long-term visitation frequency is at least a specified fraction. This quota-based notion of fairness is natural in many resource-allocation settings where the dynamics of a single resource being allocated is governed by an MDP and the distribution of the shared resource is captured by its state-visitation frequency. In an average-reward MDP (AMDP) setting, we formulate the problem as a bilinear saddle point program and, for a generative model, solve it using a Stochastic Mirror Descent (SMD) based algorithm. The proposed solution guarantees a simultaneous approximation on the expected average-reward and fairness requirement. We give sample complexity bounds for the proposed algorithm and validate our theoretical results with experiments on simulated data.

References

[1]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. In International Conference on Machine Learning (ICML). PMLR, 22--31.

[2]

Muhammad Ali, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan Mislove, and Aaron Rieke. 2019. Discrimination Through Optimization: How Facebook's Ad Delivery Can Lead to Biased Outcomes. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--30.

Digital Library

[3]

Eitan Altman. 1999. Constrained Markov Decision Processes. Vol. 7. CRC Press.

[4]

Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning .fairmlbook.org. http://www.fairmlbook.org.

[5]

James A Berkovec, Glenn B Canner, Stuart A Gabriel, and Timothy H Hannan. 1996. Mortgage Discrimination and FHA Loan Performance. Cityscape (1996), 9--24.

[6]

Ronen I Brafman and Moshe Tennenholtz. 2002. R-max -- A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning. Journal of Machine Learning Research (JMLR), Vol. 3, Oct (2002), 213--231.

[7]

Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. 2009. Building Classifiers with Independency Constraints. In IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 13--18.

[8]

Yair Carmon, Yujia Jin, Aaron Sidford, and Kevin Tian. 2019. Variance reduction for matrix games. In Advances in Neural Information Processing Systems (NeurIPS). 11381--11392.

[9]

L Elisa Celis, Sayash Kapoor, Farnood Salehi, and Nisheeth Vishnoi. 2019. Controlling Polarization in Personalization: An Algorithmic Framework. In Conference on Fairness, Accountability, and Transparency (FAT*). 160--169.

Digital Library

[10]

Elliot Creager, David Madras, Toniann Pitassi, and Richard Zemel. 2020. Causal Modeling for Fairness in Dynamical Systems. In International Conference on Machine Learning (ICML). PMLR, 2185--2195.

[11]

Alexander D'Amour, Hansa Srinivasan, James Atwood, Pallavi Baljekar, D Sculley, and Yoni Halpern. 2020. Fairness is Not Static: Deeper Understanding of Long Term Fairness via Simulation Studies. In Conference on Fairness, Accountability, and Transparency (FAT*). 525--534.

Digital Library

[12]

Shayan Doroudi, Philip S Thomas, and Emma Brunskill. 2018. Importance Sampling for Fair Policy Selection. In International Joint Conference on Artificial Intelligence (IJCAI). 5239--5243.

[13]

Julia Dressel and Hany Farid. 2018. The Accuracy, Fairness, and Limits of Predicting Recidivism. Science Advances, Vol. 4, 1 (2018), eaao5580.

[14]

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Innovations in Theoretical Computer Science Conference (ITCS). 214--226.

[15]

Peter Geibel and Fritz Wysotzki. 2005. Risk-Sensitive Reinforcement Learning Applied to Control Under Constraints. Journal of Artificial Intelligence Research (JAIR), Vol. 24 (2005), 81--108.

Digital Library

[16]

Elad Hazan, Sham Kakade, Karan Singh, and Abby Van Soest. 2019. Provably Efficient Maximum Entropy Exploration. In International Conference on Machine Learning (ICML). PMLR, 2681--2691.

[17]

Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth. 2017. Fairness in Reinforcement Learning. In International Conference on Machine Learning (ICML). PMLR, 1617--1626.

[18]

Yujia Jin and Aaron Sidford. 2020. Efficiently Solving MDPs with Stochastic Mirror Descent. In International Conference on Machine Learning (ICML). PMLR, 4890--4900.

[19]

Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. 2016. Fairness in Learning: Classic and Contextual Bandits. Advances in Neural Information Processing Systems (NIPS), Vol. 29 (2016), 325--333.

[20]

Michael Kearns and Satinder Singh. 2002. Near-Optimal Reinforcement Learning in Polynomial Time. Machine learning, Vol. 49, 2 (2002), 209--232.

[21]

Michael J Kearns, Yishay Mansour, and Andrew Y Ng. [n.d.]. Approximate Planning in Large POMDPs via Reusable Trajectories. In Advances in Neural Information Processing Systems (NIPS) .

[22]

Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2017. Inherent Trade-Offs in the Fair Determination of Risk Scores. In Innovations in Theoretical Computer Science Conference (ITCS), Vol. 67. 43:1--43:23.

[23]

Lisa Lee, Benjamin Eysenbach, Emilio Parisotto, Eric Xing, Sergey Levine, and Ruslan Salakhutdinov. 2019. Efficient Exploration via State Marginal Matching. arXiv preprint arXiv:1906.05274 (2019).

[24]

Fengjiao Li, Jia Liu, and Bo Ji. 2019. Combinatorial sleeping bandits with fairness constraints. IEEE Transactions on Network Science and Engineering (T-NSE), Vol. 7, 3 (2019), 1799--1813.

[25]

Sridhar Mahadevan. 1996. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results. Machine Learning, Vol. 22, 1--3 (1996), 159--195.

Digital Library

[26]

Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro. 2009. Robust Stochastic Approximation Approach to Stochastic Programming. SIAM Journal on Optimization, Vol. 19, 4 (2009), 1574--1609.

Digital Library

[27]

Vishakha Patil, Ganesh Ghalme, Vineet Nair, and Y Narahari. 2020. Achieving Fairness in the Stochastic Multi-Armed Bandit Problem. In AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. 5379--5386.

[28]

Daniel Paulin. 2015. Concentration Inequalities for Markov Chains by Marton Couplings and Spectral Methods. Electronic Journal of Probability, Vol. 20 (2015), 1--32.

[29]

Martin L Puterman. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming .John Wiley & Sons.

[30]

Shai Shalev-Shwartz et al. 2011. Online Learning and Online Convex Optimization. Foundations and trends in Machine Learning, Vol. 4, 2 (2011), 107--194.

[31]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction .MIT Press.

Digital Library

[32]

Aviv Tamar, Dotan Di Castro, and Shie Mannor. 2012. Policy Gradients with Variance Related Risk Criteria. In International Conference on Machine Learning (ICML). 1651--1658.

[33]

Mengdi Wang. 2017. Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems. arXiv preprint arXiv:1710.06100 (2017).

[34]

Min Wen, Osbert Bastani, and Ufuk Topcu. 2021. Algorithms for Fairness in Sequential Decision Making. In International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 1144--1152.

[35]

DJ White. 1973. An Example of Loosely Coupled Stages in Dynamic Programming. Management Science, Vol. 19, 7 (1973), 739--746.

Digital Library

[36]

Sirui Yao and Bert Huang. 2017. Beyond parity: Fairness objectives for collaborative filtering. In Advances in Neural Information Processing Systems (NIPS), Vol. 30. 2921--2930.

[37]

Tom Zahavy, Alon Cohen, Haim Kaplan, and Yishay Mansour. 2020. Unknown Mixing Times in Apprenticeship and Reinforcement Learning. In Conference on Uncertainty in Artificial Intelligence (UAI). PMLR, 430--439.

[38]

Xueru Zhang and Mingyan Liu. 2021. Fairness in Learning-Based Sequential Decision Algorithms: A Survey. In Handbook of Reinforcement Learning and Control. Springer, 525--555.

[39]

Xueru Zhang, Ruibo Tu, Yang Liu, Mingyan Liu, Hedvig Kjellstrom, Kun Zhang, and Cheng Zhang. 2020. How do Fair Decisions Fare in Long-Term Qualification?. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 33. 18457--18469.

Cited By

Patil VNair VGhalme GKhan AElkind E(2023)Mitigating disparity while maximizing rewardProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/456(4100-4108)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/456

Index Terms

Long-Term Resource Allocation Fairness in Average Markov Decision Process (AMDP) Environment
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
    2. Machine learning approaches
      1. Markov decision processes
2. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization

Recommendations

An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes

This paper concerns discrete-time, finite state multichain MDPs with compact action sets. The optimality criterion is long-run average cost. Simple examples illustrate that optimal stationary Markov policies do not always exist. We establish the ...
Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

<P>A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs referred to, in general, as Markov decision problems or MDPs. ...
Markov Decision Processes with Multiple Long-Run Average Objectives
FSTTCS 2007: Foundations of Software Technology and Theoretical Computer Science
Abstract
We consider Markov decision processes (MDPs) with multiple long-run average objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-offs ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

May 2022

1990 pages

ISBN:9781450392136

General Chairs:
Catherine Pelachaud
CNRS-ISIR, Sorbonne University, France
,
Matthew E. Taylor
University of Alberta, Canada
,
Program Chairs:
Piotr Faliszewski
AGH University of Science and Technology, Poland
,
Viviana Mascardi
University of Genova, Italy

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 09 May 2022

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS ' 22

Sponsor:

SIGAI

AAMAS ' 22: International Conference on Autonomous Agents and Multi-Agent Systems

May 9 - 13, 2022

Virtual Event, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
42
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Patil VNair VGhalme GKhan AElkind E(2023)Mitigating disparity while maximizing rewardProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/456(4100-4108)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/456

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents