[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3171642.3171832guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Transfer learning in multi-armed bandits: a causal approach

Published: 19 August 2017 Publication History

Abstract

Reinforcement learning (RL) agents have been deployed in complex environments where interactions are costly, and learning is usually slow. One prominent task in these settings is to reuse interactions performed by other agents to accelerate the learning process. Causal inference provides a family of methods to infer the effects of actions from a combination of data and qualitative assumptions about the underlying environment. Despite its success of transferring invariant knowledge across domains in the empirical sciences, causal inference has not been fully realized in the context of transfer learning in interactive domains. In this paper, we use causal inference as a basis to support a principled and more robust transfer of knowledge in RL settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by do-calculus [Pearl, 2000] and standard learning techniques. Our new identification strategy combines two steps - first, deriving bounds over the arms distribution based on structural knowledge; second, incorporating these bounds in a dynamic allocation procedure so as to guide the search towards more promising actions. We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods.

References

[1]
S. Agrawal and N. Goyal. Analysis of thompson sampling for the multi-armed bandit problem. CoRR , abs/1111.1797, 2011.
[2]
Brenna D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of robot learning from demonstration. Robotics and autonomous systems , 57(5):469-483, 2009.
[3]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning , 47(2-3):235-256, 2002.
[4]
E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: z - identifiability. In Nando de Freitas and Kevin Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence , pages 113-120, Corvallis, OR, 2012. AUAI Press.
[5]
E. Bareinboim and J. Pearl. Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences , 113:7345-7352, 2016.
[6]
Elias Bareinboim, Andrew Forney, and Judea Pearl. Bandits with unobserved confounders: A causal approach. In Advances in Neural Information Processing Systems , pages 1342-1350, 2015.
[7]
Olivier Cappé, Aurélien Garivier, Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz, et al. Kullback-leibler upper confidence bounds for optimal sequential allocation. The Annals of Statistics , 41(3):1516-1541, 2013.
[8]
Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sampling. In Advances in neural information processing systems , pages 2249- 2257, 2011.
[9]
D. Heckerman and R. Shachter. A definition and graphical representation for causality. In P. Besnard and S. Hanks, editors, Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence , pages 262-273, San Francisco, 1995. Morgan Kaufmann.
[10]
Y. Huang and M. Valtorta. Pearl's calculus of intervention is complete. In R. Dechter and T.S. Richardson, editors, Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence , pages 217-224. AUAI Press, Corvallis, OR, 2006.
[11]
George Konidaris and Andrew G Barto. Building portable options: Skill transfer in reinforcement learning. In IJCAI , volume 7, pages 895-900, 2007.
[12]
John Langford and Tong Zhang. The epoch-greedy algorithm for multiarmed bandits with side information. In Advances in neural information processing systems , pages 817-824, 2008.
[13]
Alessandro Lazaric. Transfer in reinforcement learning: a framework and a survey. In Reinforcement Learning , pages 143-173. Springer, 2012.
[14]
Yaxin Liu and Peter Stone. Valuefunction-based transfer for reinforcement learning using structure mapping. 2006.
[15]
Neville Mehta, Soumya Ray, Prasad Tadepalli, and Thomas Dietterich. Automatic discovery and transfer of maxq hierarchies. In Proceedings of the 25th international conference on Machine learning , pages 648-655. ACM, 2008.
[16]
Neville Mehta, Soumya Ray, Prasad Tadepalli, and Thomas Dietterich. Automatic discovery and transfer of task hierarchies in reinforcement learning. AI Magazine , 32(1):35-51, 2011.
[17]
J. Pearl. Causal diagrams for empirical research. Biometrika , 82(4):669-710, 1995.
[18]
J. Pearl. Causality: Models, Reasoning, and Inference . Cambridge University Press, New York, 2000. 2nd edition, 2009.
[19]
Judea Pearl. Is scientific knowledge useful for policy analysis? a peculiar theorem says: No. Journal of Causal Inference J. Causal Infer. , 2(1):109-112, 2014.
[20]
I. Shpitser and J Pearl. Identification of conditional interventional distributions. In R. Dechter and T.S. Richardson, editors, Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence , pages 437-444. AUAI Press, Corvallis, OR, 2006.
[21]
Alex Strehl, John Langford, Lihong Li, and Sham M Kakade. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems , pages 2217-2225, 2010.
[22]
Adith Swaminathan and Thorsten Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. In ICML , pages 814-823, 2015.
[23]
Matthew E Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research , 10(Jul):1633-1685, 2009.
[24]
William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika , 25(3/4):285- 294, 1933.
[25]
J. Tian and J. Pearl. A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence , pages 567-573, Menlo Park, CA, 2002. AAAI Press/The MIT Press.
[26]
Junzhe Zhang and Elias Bareinboim. Transfer learning in multi-armed bandit: a causal approach. Technical Report R-25, Purdue AI Lab., 2017.

Cited By

View all
  • (2019)The seven tools of causal inference, with reflections on machine learningCommunications of the ACM10.1145/324103662:3(54-60)Online publication date: 21-Feb-2019
  • (2018)Confounding-robust policy improvementProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327600(9289-9299)Online publication date: 3-Dec-2018
  • (2018)Structural causal banditsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327182(2573-2583)Online publication date: 3-Dec-2018
  • Show More Cited By
  1. Transfer learning in multi-armed bandits: a causal approach

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence
    August 2017
    5253 pages
    ISBN:9780999241103

    Sponsors

    • Australian Comp Soc: Australian Computer Society
    • NSF: National Science Foundation
    • Griffith University
    • University of Technology Sydney
    • AI Journal: AI Journal

    Publisher

    AAAI Press

    Publication History

    Published: 19 August 2017

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)The seven tools of causal inference, with reflections on machine learningCommunications of the ACM10.1145/324103662:3(54-60)Online publication date: 21-Feb-2019
    • (2018)Confounding-robust policy improvementProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327600(9289-9299)Online publication date: 3-Dec-2018
    • (2018)Structural causal banditsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327182(2573-2583)Online publication date: 3-Dec-2018
    • (2018)Characterizing the Limits of Autonomous SystemsProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3238107(2165-2167)Online publication date: 9-Jul-2018
    • (2017)Counterfactual data-fusion for online reinforcement learnersProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305381.3305501(1156-1164)Online publication date: 6-Aug-2017

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media