[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2645710.2645732acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Ensemble contextual bandits for personalized recommendation

Published: 06 October 2014 Publication History

Abstract

The cold-start problem has attracted extensive attention among various online services that provide personalized recommendation. Many online vendors employ contextual bandit strategies to tackle the so-called exploration/exploitation dilemma rooted from the cold-start problem. However, due to high-dimensional user/item features and the underlying characteristics of bandit policies, it is often difficult for service providers to obtain and deploy an appropriate algorithm to achieve acceptable and robust economic profit.
In this paper, we explore ensemble strategies of contextual bandit algorithms to obtain robust predicted click-through rate (CTR) of web objects. The ensemble is acquired by aggregating different pulling policies of bandit algorithms, rather than forcing the agreement of prediction results or learning a unified predictive model. To this end, we employ a meta-bandit paradigm that places a hyper bandit over the base bandits, to explicitly explore/exploit the relative importance of base bandits based on user feedbacks. Extensive empirical experiments on two real-world data sets (news recommendation and online advertising) demonstrate the effectiveness of our proposed approach in terms of CTR.

Supplementary Material

JPG File (p73-sidebyside.jpg)
MP4 File (p73-sidebyside.mp4)

References

[1]
D. Agarwal et al. Online models for content optimization. In NIPS, pages 17--24, 2008.
[2]
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48--77, 2002.
[3]
A. G. Barto. Reinforcement learning: An introduction. MIT press, 1998.
[4]
C. M. Bishop et al. Pattern recognition and machine learning, volume 1. Springer, 2006.
[5]
D. Bouneffouf, A. Bouzeghoub, and A. L. Gançarski. A contextual-bandit algorithm for mobile context-aware recommender system. In NIPS, pages 324--331, 2012.
[6]
O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In NIPS, pages 2249--2257, 2011.
[7]
M. Gagliolo and J. Schmidhuber. Algorithm selection as a bandit problem with unbounded losses. Springer, 2010.
[8]
C. Giraud-Carrier. Metalearning-a tutorial. In ICMLA, 2008.
[9]
R. V. Hogg and E. A. Tanis. Probability and Statistical Inference. Prentice Hall, 1996.
[10]
M. Jahrer, A. Töscher, and R. Legenstein. Combining predictions for accurate recommender systems. In SIGKDD, pages 693--702. ACM, 2010.
[11]
Y. Koren. The bellkor solution to the netflix grand prize. Netflix prize documentation, 2009.
[12]
J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. NIPS, pages 817--824, 2007.
[13]
L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, pages 661--670. ACM, 2010.
[14]
L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In WSDM, pages 297--306. ACM, 2011.
[15]
F. Maes, L. Wehenkel, and D. Ernst. Meta-learning of exploration/exploitation strategies: The multi-armed bandit case. Springer, 2013.
[16]
A. Maurer. Algorithmic stability and meta-learning. In JMLR, pages 967--994, 2005.
[17]
T. G. McKenzie et al. Novel models and ensemble techniques to discriminate favorite items from unrated ones for personalized music recommendation. In KDDCUP, 2011.
[18]
R. Polikar. Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE, 6(3):21--45, 2006.
[19]
A. Prodromidis, P. Chan, and S. Stolfo. Meta-learning in distributed data mining systems: Issues and approaches. Advances in distributed and parallel knowledge discovery, 3, 2000.
[20]
J. B. Schafer, J. A. Konstan, and J. Riedl. Meta-recommendation systems: user-controlled integration of diverse recommendations. In CIKM, pages 43--51. ACM, 2002.
[21]
A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start recommendations. In SIGIR, pages 253--260. ACM, 2002.
[22]
S. L. Scott. A modern bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6):639--658, 2010.
[23]
J. Seiler. Meta learning in recommendation systems. Master Thesis, Technical University of Berlin, 2013.
[24]
J. Sill, G. Takács, L. Mackey, and D. Lin. Feature-weighted linear stacking. arXiv preprint arXiv:0911.0460, 2009.
[25]
C. Tekin and M. van der Schaar. Decentralized online big data classification-a bandit framework. arXiv preprint arXiv:1308.4565, 2013.
[26]
W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, pages 285--294, 1933.
[27]
M. Tiemann and S. Pauws. Towards ensemble learning for hybrid music recommendation. In RecSys, pages 177--178. ACM, 2007.
[28]
M. Tokic. Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI, pages 203--210, 2010.
[29]
J. Vermorel and M. Mohri. Multi-armed bandit algorithms and empirical evaluation. In ECML, pages 437--448, 2005.
[30]
M. Wu. Collaborative filtering via ensembles of matrix factorizations. In KDDCUP, 2007.
[31]
K. Yu, A. Schwaighofer, and V. Tresp. Collaborative ensemble learning: Combining collaborative and content-based information filtering via hierarchical bayes. In UAI, pages 616--623, 2002.
[32]
B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In ICML, pages 114--121. ACM, 2004.

Cited By

View all
  • (2025)Reinforcement learning-based optimal control of wearable alarms for consistent roadway workers’ reactions to traffic hazardsJournal of Transportation Safety & Security10.1080/19439962.2024.2449119(1-25)Online publication date: 9-Jan-2025
  • (2025)FareIQ: Intelligent Fare Optimization for Cab Drivers Using Reinforcement LearningInnovations in Electrical and Electronics Engineering10.1007/978-981-97-9112-5_34(573-588)Online publication date: 31-Jan-2025
  • (2024)Regionalization-Based Collaborative Filtering: Harnessing Geographical Information in RecommendersACM Transactions on Spatial Algorithms and Systems10.1145/365664110:2(1-23)Online publication date: 21-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
RecSys '14: Proceedings of the 8th ACM Conference on Recommender systems
October 2014
458 pages
ISBN:9781450326681
DOI:10.1145/2645710
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 October 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. contextual bandit
  2. ctr prediction
  3. ensemble recommendation
  4. meta learning
  5. personalized recommendation

Qualifiers

  • Research-article

Funding Sources

Conference

RecSys'14
Sponsor:
RecSys'14: Eighth ACM Conference on Recommender Systems
October 6 - 10, 2014
California, Foster City, Silicon Valley, USA

Acceptance Rates

RecSys '14 Paper Acceptance Rate 35 of 234 submissions, 15%;
Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)88
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Reinforcement learning-based optimal control of wearable alarms for consistent roadway workers’ reactions to traffic hazardsJournal of Transportation Safety & Security10.1080/19439962.2024.2449119(1-25)Online publication date: 9-Jan-2025
  • (2025)FareIQ: Intelligent Fare Optimization for Cab Drivers Using Reinforcement LearningInnovations in Electrical and Electronics Engineering10.1007/978-981-97-9112-5_34(573-588)Online publication date: 31-Jan-2025
  • (2024)Regionalization-Based Collaborative Filtering: Harnessing Geographical Information in RecommendersACM Transactions on Spatial Algorithms and Systems10.1145/365664110:2(1-23)Online publication date: 21-May-2024
  • (2024)How Are Machine Learning and Artificial Intelligence Used in Digital Behavior Change Interventions? A Scoping ReviewMayo Clinic Proceedings: Digital Health10.1016/j.mcpdig.2024.05.007Online publication date: May-2024
  • (2024)Unpacking the exploration–exploitation tradeoff on SnapchatComputers in Human Behavior10.1016/j.chb.2023.108014150:COnline publication date: 1-Jan-2024
  • (2024)Federated Constrastive Learning and Visual Transformers for Personal RecommendationCognitive Computation10.1007/s12559-024-10286-016:5(2551-2565)Online publication date: 8-May-2024
  • (2024)A systematic literature review of recent advances on context-aware recommender systemsArtificial Intelligence Review10.1007/s10462-024-10939-458:1Online publication date: 16-Nov-2024
  • (2023)Exploration for freeProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626039(2192-2202)Online publication date: 31-Jul-2023
  • (2023)STAN: Stage-Adaptive Network for Multi-Task Recommendation by Learning User Lifecycle-Based RepresentationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608796(602-612)Online publication date: 14-Sep-2023
  • (2023)User Tampering in Reinforcement Learning Recommender SystemsProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604669(58-69)Online publication date: 8-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media