More Web Proxy on the site http://driver.im/

Article

Transfer learning in multi-armed bandits: a causal approach

Authors:

Elias BareinboimAuthors Info & Claims

IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence

Pages 1340 - 1346

Published: 19 August 2017 Publication History

Abstract

Reinforcement learning (RL) agents have been deployed in complex environments where interactions are costly, and learning is usually slow. One prominent task in these settings is to reuse interactions performed by other agents to accelerate the learning process. Causal inference provides a family of methods to infer the effects of actions from a combination of data and qualitative assumptions about the underlying environment. Despite its success of transferring invariant knowledge across domains in the empirical sciences, causal inference has not been fully realized in the context of transfer learning in interactive domains. In this paper, we use causal inference as a basis to support a principled and more robust transfer of knowledge in RL settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by do-calculus [Pearl, 2000] and standard learning techniques. Our new identification strategy combines two steps - first, deriving bounds over the arms distribution based on structural knowledge; second, incorporating these bounds in a dynamic allocation procedure so as to guide the search towards more promising actions. We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods.

References

[1]

S. Agrawal and N. Goyal. Analysis of thompson sampling for the multi-armed bandit problem. CoRR , abs/1111.1797, 2011.

[2]

Brenna D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of robot learning from demonstration. Robotics and autonomous systems , 57(5):469-483, 2009.

Digital Library

[3]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning , 47(2-3):235-256, 2002.

Digital Library

[4]

E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: z - identifiability. In Nando de Freitas and Kevin Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence , pages 113-120, Corvallis, OR, 2012. AUAI Press.

Digital Library

[5]

E. Bareinboim and J. Pearl. Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences , 113:7345-7352, 2016.

[6]

Elias Bareinboim, Andrew Forney, and Judea Pearl. Bandits with unobserved confounders: A causal approach. In Advances in Neural Information Processing Systems , pages 1342-1350, 2015.

Digital Library

[7]

Olivier Cappé, Aurélien Garivier, Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz, et al. Kullback-leibler upper confidence bounds for optimal sequential allocation. The Annals of Statistics , 41(3):1516-1541, 2013.

[8]

Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sampling. In Advances in neural information processing systems , pages 2249- 2257, 2011.

Digital Library

[9]

D. Heckerman and R. Shachter. A definition and graphical representation for causality. In P. Besnard and S. Hanks, editors, Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence , pages 262-273, San Francisco, 1995. Morgan Kaufmann.

Digital Library

[10]

Y. Huang and M. Valtorta. Pearl's calculus of intervention is complete. In R. Dechter and T.S. Richardson, editors, Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence , pages 217-224. AUAI Press, Corvallis, OR, 2006.

Digital Library

[11]

George Konidaris and Andrew G Barto. Building portable options: Skill transfer in reinforcement learning. In IJCAI , volume 7, pages 895-900, 2007.

Digital Library

[12]

John Langford and Tong Zhang. The epoch-greedy algorithm for multiarmed bandits with side information. In Advances in neural information processing systems , pages 817-824, 2008.

Digital Library

[13]

Alessandro Lazaric. Transfer in reinforcement learning: a framework and a survey. In Reinforcement Learning , pages 143-173. Springer, 2012.

[14]

Yaxin Liu and Peter Stone. Valuefunction-based transfer for reinforcement learning using structure mapping. 2006.

[15]

Neville Mehta, Soumya Ray, Prasad Tadepalli, and Thomas Dietterich. Automatic discovery and transfer of maxq hierarchies. In Proceedings of the 25th international conference on Machine learning , pages 648-655. ACM, 2008.

Digital Library

[16]

Neville Mehta, Soumya Ray, Prasad Tadepalli, and Thomas Dietterich. Automatic discovery and transfer of task hierarchies in reinforcement learning. AI Magazine , 32(1):35-51, 2011.

[17]

J. Pearl. Causal diagrams for empirical research. Biometrika , 82(4):669-710, 1995.

[18]

J. Pearl. Causality: Models, Reasoning, and Inference . Cambridge University Press, New York, 2000. 2nd edition, 2009.

Digital Library

[19]

Judea Pearl. Is scientific knowledge useful for policy analysis? a peculiar theorem says: No. Journal of Causal Inference J. Causal Infer. , 2(1):109-112, 2014.

[20]

I. Shpitser and J Pearl. Identification of conditional interventional distributions. In R. Dechter and T.S. Richardson, editors, Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence , pages 437-444. AUAI Press, Corvallis, OR, 2006.

Digital Library

[21]

Alex Strehl, John Langford, Lihong Li, and Sham M Kakade. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems , pages 2217-2225, 2010.

Digital Library

[22]

Adith Swaminathan and Thorsten Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. In ICML , pages 814-823, 2015.

Digital Library

[23]

Matthew E Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research , 10(Jul):1633-1685, 2009.

Digital Library

[24]

William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika , 25(3/4):285- 294, 1933.

[25]

J. Tian and J. Pearl. A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence , pages 567-573, Menlo Park, CA, 2002. AAAI Press/The MIT Press.

Digital Library

[26]

Junzhe Zhang and Elias Bareinboim. Transfer learning in multi-armed bandit: a causal approach. Technical Report R-25, Purdue AI Lab., 2017.

Cited By

Pearl J(2019)The seven tools of causal inference, with reflections on machine learningCommunications of the ACM10.1145/324103662:3(54-60)Online publication date: 21-Feb-2019
https://dl.acm.org/doi/10.1145/3241036
Kallus NZhou A(2018)Confounding-robust policy improvementProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327600(9289-9299)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327546.3327600
Lee SBareinboim E(2018)Structural causal banditsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327182(2573-2583)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327182
Show More Cited By

Transfer learning in multi-armed bandits: a causal approach
1. Computing methodologies

Recommendations

Transfer Learning in Multi-Armed Bandit: A Causal Approach
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

We leverage causal inference tools to support a principled and more robust transfer of knowledge in reinforcement learning (RL) settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal ...
PAC-Bayesian lifelong learning for multi-armed bandits
Abstract
We present a PAC-Bayesian analysis of lifelong learning. In the lifelong learning problem, a sequence of learning tasks is observed one-at-a-time, and the goal is to transfer information acquired from previous tasks to new learning tasks. We ...
Causally abstracted multi-armed bandits
UAI '24: Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence

Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decisionmaking problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence

August 2017

5253 pages

ISBN:9780999241103

Editor:
Carles Sierra
IIIA-CSIC

Sponsors

Australian Comp Soc: Australian Computer Society
NSF: National Science Foundation
Griffith University
University of Technology Sydney
AI Journal: AI Journal

Publisher

AAAI Press

Publication History

Published: 19 August 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pearl J(2019)The seven tools of causal inference, with reflections on machine learningCommunications of the ACM10.1145/324103662:3(54-60)Online publication date: 21-Feb-2019
https://dl.acm.org/doi/10.1145/3241036
Kallus NZhou A(2018)Confounding-robust policy improvementProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327600(9289-9299)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327546.3327600
Lee SBareinboim E(2018)Structural causal banditsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327182(2573-2583)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327182
Zhang JBareinboim EAndre EKoenig SDastani MSukthankar G(2018)Characterizing the Limits of Autonomous SystemsProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3238107(2165-2167)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3238107
Forney APearl JBareinboim E(2017)Counterfactual data-fusion for online reinforcement learnersProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305381.3305501(1156-1164)Online publication date: 6-Aug-2017
https://dl.acm.org/doi/10.5555/3305381.3305501

View Options

View options

Figures

Tables

Media

View Table of Conten