More Web Proxy on the site http://driver.im/

research-article

Ensemble contextual bandits for personalized recommendation

Authors:

Tao LiAuthors Info & Claims

RecSys '14: Proceedings of the 8th ACM Conference on Recommender systems

Pages 73 - 80

https://doi.org/10.1145/2645710.2645732

Published: 06 October 2014 Publication History

Abstract

The cold-start problem has attracted extensive attention among various online services that provide personalized recommendation. Many online vendors employ contextual bandit strategies to tackle the so-called exploration/exploitation dilemma rooted from the cold-start problem. However, due to high-dimensional user/item features and the underlying characteristics of bandit policies, it is often difficult for service providers to obtain and deploy an appropriate algorithm to achieve acceptable and robust economic profit.

In this paper, we explore ensemble strategies of contextual bandit algorithms to obtain robust predicted click-through rate (CTR) of web objects. The ensemble is acquired by aggregating different pulling policies of bandit algorithms, rather than forcing the agreement of prediction results or learning a unified predictive model. To this end, we employ a meta-bandit paradigm that places a hyper bandit over the base bandits, to explicitly explore/exploit the relative importance of base bandits based on user feedbacks. Extensive empirical experiments on two real-world data sets (news recommendation and online advertising) demonstrate the effectiveness of our proposed approach in terms of CTR.

Supplementary Material

JPG File (p73-sidebyside.jpg)

Download
8.92 KB

MP4 File (p73-sidebyside.mp4)

Download
38.33 MB

References

[1]

D. Agarwal et al. Online models for content optimization. In NIPS, pages 17--24, 2008.

[2]

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48--77, 2002.

Digital Library

[3]

A. G. Barto. Reinforcement learning: An introduction. MIT press, 1998.

[4]

C. M. Bishop et al. Pattern recognition and machine learning, volume 1. Springer, 2006.

Digital Library

[5]

D. Bouneffouf, A. Bouzeghoub, and A. L. Gançarski. A contextual-bandit algorithm for mobile context-aware recommender system. In NIPS, pages 324--331, 2012.

Digital Library

[6]

O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In NIPS, pages 2249--2257, 2011.

Digital Library

[7]

M. Gagliolo and J. Schmidhuber. Algorithm selection as a bandit problem with unbounded losses. Springer, 2010.

[8]

C. Giraud-Carrier. Metalearning-a tutorial. In ICMLA, 2008.

[9]

R. V. Hogg and E. A. Tanis. Probability and Statistical Inference. Prentice Hall, 1996.

[10]

M. Jahrer, A. Töscher, and R. Legenstein. Combining predictions for accurate recommender systems. In SIGKDD, pages 693--702. ACM, 2010.

Digital Library

[11]

Y. Koren. The bellkor solution to the netflix grand prize. Netflix prize documentation, 2009.

[12]

J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. NIPS, pages 817--824, 2007.

[13]

L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, pages 661--670. ACM, 2010.

Digital Library

[14]

L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In WSDM, pages 297--306. ACM, 2011.

Digital Library

[15]

F. Maes, L. Wehenkel, and D. Ernst. Meta-learning of exploration/exploitation strategies: The multi-armed bandit case. Springer, 2013.

[16]

A. Maurer. Algorithmic stability and meta-learning. In JMLR, pages 967--994, 2005.

Digital Library

[17]

T. G. McKenzie et al. Novel models and ensemble techniques to discriminate favorite items from unrated ones for personalized music recommendation. In KDDCUP, 2011.

[18]

R. Polikar. Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE, 6(3):21--45, 2006.

[19]

A. Prodromidis, P. Chan, and S. Stolfo. Meta-learning in distributed data mining systems: Issues and approaches. Advances in distributed and parallel knowledge discovery, 3, 2000.

[20]

J. B. Schafer, J. A. Konstan, and J. Riedl. Meta-recommendation systems: user-controlled integration of diverse recommendations. In CIKM, pages 43--51. ACM, 2002.

Digital Library

[21]

A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start recommendations. In SIGIR, pages 253--260. ACM, 2002.

Digital Library

[22]

S. L. Scott. A modern bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6):639--658, 2010.

Digital Library

[23]

J. Seiler. Meta learning in recommendation systems. Master Thesis, Technical University of Berlin, 2013.

[24]

J. Sill, G. Takács, L. Mackey, and D. Lin. Feature-weighted linear stacking. arXiv preprint arXiv:0911.0460, 2009.

[25]

C. Tekin and M. van der Schaar. Decentralized online big data classification-a bandit framework. arXiv preprint arXiv:1308.4565, 2013.

[26]

W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, pages 285--294, 1933.

[27]

M. Tiemann and S. Pauws. Towards ensemble learning for hybrid music recommendation. In RecSys, pages 177--178. ACM, 2007.

Digital Library

[28]

M. Tokic. Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI, pages 203--210, 2010.

Digital Library

[29]

J. Vermorel and M. Mohri. Multi-armed bandit algorithms and empirical evaluation. In ECML, pages 437--448, 2005.

Digital Library

[30]

M. Wu. Collaborative filtering via ensembles of matrix factorizations. In KDDCUP, 2007.

[31]

K. Yu, A. Schwaighofer, and V. Tresp. Collaborative ensemble learning: Combining collaborative and content-based information filtering via hierarchical bayes. In UAI, pages 616--623, 2002.

Digital Library

[32]

B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In ICML, pages 114--121. ACM, 2004.

Digital Library

Cited By

Lu DErgan SOzbay K(2025)Reinforcement learning-based optimal control of wearable alarms for consistent roadway workers’ reactions to traffic hazardsJournal of Transportation Safety & Security10.1080/19439962.2024.2449119(1-25)Online publication date: 9-Jan-2025
https://doi.org/10.1080/19439962.2024.2449119
Chaudhary KSharma S(2025)FareIQ: Intelligent Fare Optimization for Cab Drivers Using Reinforcement LearningInnovations in Electrical and Electronics Engineering10.1007/978-981-97-9112-5_34(573-588)Online publication date: 31-Jan-2025
https://doi.org/10.1007/978-981-97-9112-5_34
Alves R(2024)Regionalization-Based Collaborative Filtering: Harnessing Geographical Information in RecommendersACM Transactions on Spatial Algorithms and Systems10.1145/365664110:2(1-23)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3656641
Show More Cited By

Index Terms

Ensemble contextual bandits for personalized recommendation

Recommendations

Personalized Recommendation via Parameter-Free Contextual Bandits
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Personalized recommendation services have gained increasing popularity and attention in recent years as most useful information can be accessed online in real-time. Most online recommender systems try to address the information needs of users by virtue ...
A contextual-bandit approach to personalized news article recommendation
WWW '10: Proceedings of the 19th international conference on World wide web

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two ...
Neural Contextual Bandits for Personalized Recommendation
WWW '24: Companion Proceedings of the ACM Web Conference 2024

In the dynamic landscape of online businesses, recommender systems are pivotal in enhancing user experiences. While traditional approaches have relied on static supervised learning, the quest for adaptive, user-centric recommendations has led to the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '14: Proceedings of the 8th ACM Conference on Recommender systems

October 2014

458 pages

ISBN:9781450326681

DOI:10.1145/2645710

General Chairs:
Alfred Kobsa
University of California, Irvine
,
Michelle Zhou
IBM
,
Program Chairs:
Martin Ester
Simon Fraser University
,
Yehuda Koren
Google

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 October 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

RecSys'14

Sponsor:

SIGCHI

RecSys'14: Eighth ACM Conference on Recommender Systems

October 6 - 10, 2014

California, Foster City, Silicon Valley, USA

Acceptance Rates

RecSys '14 Paper Acceptance Rate 35 of 234 submissions, 15%;

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
1,553
Total Downloads

Downloads (Last 12 months)88
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lu DErgan SOzbay K(2025)Reinforcement learning-based optimal control of wearable alarms for consistent roadway workers’ reactions to traffic hazardsJournal of Transportation Safety & Security10.1080/19439962.2024.2449119(1-25)Online publication date: 9-Jan-2025
https://doi.org/10.1080/19439962.2024.2449119
Chaudhary KSharma S(2025)FareIQ: Intelligent Fare Optimization for Cab Drivers Using Reinforcement LearningInnovations in Electrical and Electronics Engineering10.1007/978-981-97-9112-5_34(573-588)Online publication date: 31-Jan-2025
https://doi.org/10.1007/978-981-97-9112-5_34
Alves R(2024)Regionalization-Based Collaborative Filtering: Harnessing Geographical Information in RecommendersACM Transactions on Spatial Algorithms and Systems10.1145/365664110:2(1-23)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3656641
Bucher ABlazek ESymons C(2024)How Are Machine Learning and Artificial Intelligence Used in Digital Behavior Change Interventions? A Scoping ReviewMayo Clinic Proceedings: Digital Health10.1016/j.mcpdig.2024.05.007Online publication date: May-2024
https://doi.org/10.1016/j.mcpdig.2024.05.007
Gómez-Zará DLiu YNeves LShah NBos M(2024)Unpacking the exploration–exploitation tradeoff on SnapchatComputers in Human Behavior10.1016/j.chb.2023.108014150:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.chb.2023.108014
Belhadi ADjenouri Yde Alcantara Andrade FSrivastava G(2024)Federated Constrastive Learning and Visual Transformers for Personal RecommendationCognitive Computation10.1007/s12559-024-10286-016:5(2551-2565)Online publication date: 8-May-2024
https://doi.org/10.1007/s12559-024-10286-0
Mateos PBellogín A(2024)A systematic literature review of recent advances on context-aware recommender systemsArtificial Intelligence Review10.1007/s10462-024-10939-458:1Online publication date: 16-Nov-2024
https://doi.org/10.1007/s10462-024-10939-4
Wang XYang LChen YLiu XHajiesmaili MTowsley DLui JEvans RShpitser I(2023)Exploration for freeProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626039(2192-2202)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3626039
Li WZheng WXiao XWang S(2023)STAN: Stage-Adaptive Network for Multi-Task Recommendation by Learning User Lifecycle-Based RepresentationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608796(602-612)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608796
Kasirzadeh AEvans C(2023)User Tampering in Reinforcement Learning Recommender SystemsProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604669(58-69)Online publication date: 8-Aug-2023
https://dl.acm.org/doi/10.1145/3600211.3604669
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten