[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3535850.3535910acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Long-Term Resource Allocation Fairness in Average Markov Decision Process (AMDP) Environment

Published: 09 May 2022 Publication History

Abstract

Fairness has emerged as an important concern in automated decision-making in recent years, especially when these decisions affect human welfare. In this work, we study fairness in temporally extended decision-making settings, specifically those formulated as Markov Decision Processes (MDPs). Our proposed notion of fairness ensures that each state's long-term visitation frequency is at least a specified fraction. This quota-based notion of fairness is natural in many resource-allocation settings where the dynamics of a single resource being allocated is governed by an MDP and the distribution of the shared resource is captured by its state-visitation frequency. In an average-reward MDP (AMDP) setting, we formulate the problem as a bilinear saddle point program and, for a generative model, solve it using a Stochastic Mirror Descent (SMD) based algorithm. The proposed solution guarantees a simultaneous approximation on the expected average-reward and fairness requirement. We give sample complexity bounds for the proposed algorithm and validate our theoretical results with experiments on simulated data.

References

[1]
Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. In International Conference on Machine Learning (ICML). PMLR, 22--31.
[2]
Muhammad Ali, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan Mislove, and Aaron Rieke. 2019. Discrimination Through Optimization: How Facebook's Ad Delivery Can Lead to Biased Outcomes. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--30.
[3]
Eitan Altman. 1999. Constrained Markov Decision Processes. Vol. 7. CRC Press.
[4]
Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning .fairmlbook.org. http://www.fairmlbook.org.
[5]
James A Berkovec, Glenn B Canner, Stuart A Gabriel, and Timothy H Hannan. 1996. Mortgage Discrimination and FHA Loan Performance. Cityscape (1996), 9--24.
[6]
Ronen I Brafman and Moshe Tennenholtz. 2002. R-max -- A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning. Journal of Machine Learning Research (JMLR), Vol. 3, Oct (2002), 213--231.
[7]
Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. 2009. Building Classifiers with Independency Constraints. In IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 13--18.
[8]
Yair Carmon, Yujia Jin, Aaron Sidford, and Kevin Tian. 2019. Variance reduction for matrix games. In Advances in Neural Information Processing Systems (NeurIPS). 11381--11392.
[9]
L Elisa Celis, Sayash Kapoor, Farnood Salehi, and Nisheeth Vishnoi. 2019. Controlling Polarization in Personalization: An Algorithmic Framework. In Conference on Fairness, Accountability, and Transparency (FAT*). 160--169.
[10]
Elliot Creager, David Madras, Toniann Pitassi, and Richard Zemel. 2020. Causal Modeling for Fairness in Dynamical Systems. In International Conference on Machine Learning (ICML). PMLR, 2185--2195.
[11]
Alexander D'Amour, Hansa Srinivasan, James Atwood, Pallavi Baljekar, D Sculley, and Yoni Halpern. 2020. Fairness is Not Static: Deeper Understanding of Long Term Fairness via Simulation Studies. In Conference on Fairness, Accountability, and Transparency (FAT*). 525--534.
[12]
Shayan Doroudi, Philip S Thomas, and Emma Brunskill. 2018. Importance Sampling for Fair Policy Selection. In International Joint Conference on Artificial Intelligence (IJCAI). 5239--5243.
[13]
Julia Dressel and Hany Farid. 2018. The Accuracy, Fairness, and Limits of Predicting Recidivism. Science Advances, Vol. 4, 1 (2018), eaao5580.
[14]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Innovations in Theoretical Computer Science Conference (ITCS). 214--226.
[15]
Peter Geibel and Fritz Wysotzki. 2005. Risk-Sensitive Reinforcement Learning Applied to Control Under Constraints. Journal of Artificial Intelligence Research (JAIR), Vol. 24 (2005), 81--108.
[16]
Elad Hazan, Sham Kakade, Karan Singh, and Abby Van Soest. 2019. Provably Efficient Maximum Entropy Exploration. In International Conference on Machine Learning (ICML). PMLR, 2681--2691.
[17]
Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth. 2017. Fairness in Reinforcement Learning. In International Conference on Machine Learning (ICML). PMLR, 1617--1626.
[18]
Yujia Jin and Aaron Sidford. 2020. Efficiently Solving MDPs with Stochastic Mirror Descent. In International Conference on Machine Learning (ICML). PMLR, 4890--4900.
[19]
Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. 2016. Fairness in Learning: Classic and Contextual Bandits. Advances in Neural Information Processing Systems (NIPS), Vol. 29 (2016), 325--333.
[20]
Michael Kearns and Satinder Singh. 2002. Near-Optimal Reinforcement Learning in Polynomial Time. Machine learning, Vol. 49, 2 (2002), 209--232.
[21]
Michael J Kearns, Yishay Mansour, and Andrew Y Ng. [n.d.]. Approximate Planning in Large POMDPs via Reusable Trajectories. In Advances in Neural Information Processing Systems (NIPS) .
[22]
Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2017. Inherent Trade-Offs in the Fair Determination of Risk Scores. In Innovations in Theoretical Computer Science Conference (ITCS), Vol. 67. 43:1--43:23.
[23]
Lisa Lee, Benjamin Eysenbach, Emilio Parisotto, Eric Xing, Sergey Levine, and Ruslan Salakhutdinov. 2019. Efficient Exploration via State Marginal Matching. arXiv preprint arXiv:1906.05274 (2019).
[24]
Fengjiao Li, Jia Liu, and Bo Ji. 2019. Combinatorial sleeping bandits with fairness constraints. IEEE Transactions on Network Science and Engineering (T-NSE), Vol. 7, 3 (2019), 1799--1813.
[25]
Sridhar Mahadevan. 1996. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results. Machine Learning, Vol. 22, 1--3 (1996), 159--195.
[26]
Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro. 2009. Robust Stochastic Approximation Approach to Stochastic Programming. SIAM Journal on Optimization, Vol. 19, 4 (2009), 1574--1609.
[27]
Vishakha Patil, Ganesh Ghalme, Vineet Nair, and Y Narahari. 2020. Achieving Fairness in the Stochastic Multi-Armed Bandit Problem. In AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. 5379--5386.
[28]
Daniel Paulin. 2015. Concentration Inequalities for Markov Chains by Marton Couplings and Spectral Methods. Electronic Journal of Probability, Vol. 20 (2015), 1--32.
[29]
Martin L Puterman. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming .John Wiley & Sons.
[30]
Shai Shalev-Shwartz et al. 2011. Online Learning and Online Convex Optimization. Foundations and trends in Machine Learning, Vol. 4, 2 (2011), 107--194.
[31]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction .MIT Press.
[32]
Aviv Tamar, Dotan Di Castro, and Shie Mannor. 2012. Policy Gradients with Variance Related Risk Criteria. In International Conference on Machine Learning (ICML). 1651--1658.
[33]
Mengdi Wang. 2017. Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems. arXiv preprint arXiv:1710.06100 (2017).
[34]
Min Wen, Osbert Bastani, and Ufuk Topcu. 2021. Algorithms for Fairness in Sequential Decision Making. In International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 1144--1152.
[35]
DJ White. 1973. An Example of Loosely Coupled Stages in Dynamic Programming. Management Science, Vol. 19, 7 (1973), 739--746.
[36]
Sirui Yao and Bert Huang. 2017. Beyond parity: Fairness objectives for collaborative filtering. In Advances in Neural Information Processing Systems (NIPS), Vol. 30. 2921--2930.
[37]
Tom Zahavy, Alon Cohen, Haim Kaplan, and Yishay Mansour. 2020. Unknown Mixing Times in Apprenticeship and Reinforcement Learning. In Conference on Uncertainty in Artificial Intelligence (UAI). PMLR, 430--439.
[38]
Xueru Zhang and Mingyan Liu. 2021. Fairness in Learning-Based Sequential Decision Algorithms: A Survey. In Handbook of Reinforcement Learning and Control. Springer, 525--555.
[39]
Xueru Zhang, Ruibo Tu, Yang Liu, Mingyan Liu, Hedvig Kjellstrom, Kun Zhang, and Cheng Zhang. 2020. How do Fair Decisions Fare in Long-Term Qualification?. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 33. 18457--18469.

Cited By

View all
  • (2023)Mitigating disparity while maximizing rewardProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/456(4100-4108)Online publication date: 19-Aug-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems
May 2022
1990 pages
ISBN:9781450392136

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 09 May 2022

Check for updates

Author Tags

  1. Markov decision process
  2. fairness
  3. reinforcement learning

Qualifiers

  • Research-article

Conference

AAMAS ' 22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Mitigating disparity while maximizing rewardProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/456(4100-4108)Online publication date: 19-Aug-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media