基于遗憾探索的竞争网络强化学习智能推荐方法研究

摘要/Abstract

摘要： 近年来,深度强化学习在推荐系统中的应用受到了越来越多的关注。在已有研究的基础上提出了一种新的推荐模型RP-Dueling,该模型在深度强化学习Dueling-DQN的基础上加入了遗憾探索机制,使算法根据训练程度自适应地动态调整“探索-利用”占比。该算法实现了在拥有大规模状态空间的推荐系统中捕捉用户动态兴趣和对动作空间的充分探索。在多个数据集上进行测试,所提算法在MAE和RMSE两个评价指标上的最优平均结果分别达到了0.16和0.43,比目前的最优研究结果分别降低了0.48和0.56,实验结果表明所提模型优于目前已有的传统推荐模型和基于深度强化学习的推荐模型。

关键词: Dueling-DQN, RP-Dueling, 动态兴趣, 深度强化学习, 推荐系统, 遗憾探索

Abstract: In recent years,the application of deep reinforcement learning in recommendation system has attracted much attention.Based on the existing research,this paper proposes a new recommendation model RP-Dueling,which is based on the deep reinforcement learning Dueling-DQN algorithm,and adds the regret exploration mechanism to make the algorithm adaptively and dynamically adjust the proportion of “exploration-utilization” according to the training degree.The algorithm can capture users’ dynamic interest and fully explore the action space in the recommendation system with large-scale state space.By testing the proposed algorithm model on multiple data sets,the optimal average results of MAE and RMSE are 0.16 and 0.43 respectively,which are 0.48 and 0.56 higher than the current optimal research results.Experimental results show that the proposed model is superior to the existing traditional recommendation model and recommendation model based on deep reinforcement learning.

Key words: Deep reinforcement learning, Dueling-DQN, Dynamic interest, Recommendation system, Regret exploration, RP-Dueling

中图分类号:

TP181

洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究[J]. 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226

HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration[J]. Computer Science, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226

参考文献

[1] JACOBI J A,BENSON E A,LINDEN G D.Recommendationsystem:U.S.Patent 7,908,183[P].[2011-3-15].https://patents.glgoo.top/patent/US7908183B2/en.
[2] SCHAFER J B,FRANKOWSKI D,HERLOCKER J,et al.Collaborative filtering recommender systems[M]//The Adaptive Web.Berlin:Springer Press,2007:291-324.
[3] DORSCH M,QIU Y,SOLER D,et al.PK1/EG-VEGF induces monocyte differentiation and activation[J].Journal of Leukocyte Biology,2005,78(2):426-434.
[4] QI H M,LIU Q,DAI D X.Personalized Friend Recommendation based on Interest Topics[J].Computer Engineering and Science,2018,40(2):348-353.
[5] SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].USA:MIT Press,2018.
[6] MOHRI M,ROSTAMIZADEH A,TALWALKAR A.Foundations of machine learning[M].USA:MIT Press,2018.
[7] JORDAN M I,MITCHELL T M.Machine learning:Trends,perspectives,and prospects[J].Science,2015,349(6245):255-260.
[8] MESSNER W,HOROWITZ R,KAO W W,et al.A new adaptive learning rule[C]//Proceedings of IEEE International Conference on Robotics and Automation.New York:IEEE Press,1990:1522-1527.
[9] KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research,1996,4(1):237-285.
[10] ROJANAVASU P,SRINIL P,PINNGERN O.New Recommendation System Using Reinforcement Learning[J].International Journal of the Computer,the Internet and Management,2005,13(3):23.
[11] ZHENG G,ZHANG F,ZHENG Z,et al.DRN:A deep reinforcement learning framework for news recommendation[C]//27th International World Wide Web(WWW 2018).Association for Computing Machinery,2018:167-176.
[12] LEI Y,WANG Z,LI W,et al.Social attentive deep q-network for recommendation[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:1189-1192.
[13] ZHAO Z,CHEN X.Deep Reinforcement Learning based Reco-mmend System using stratified sampling[C]//IOP Conference Series:Materials Science and Engineering.IOP Publishing,2018.
[14] ZINKEVICH M,JOHANSON M,BOWLING M,et al.Regret minimization in games with incomplete information[J].Ad-vances in Neural Information Processing Systems,2007,20(14):1729-1736.
[15] YUAN F,HE X,KARATZOGLOU A,et al.Parameter-efficienttransfer from sequential behaviors for user modeling and recommendation[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1469-1478.
[16] BAGHER R C,HASSANPOUR H,MASHAYEKHI H.Usertrends modeling for a content-based recommender system[J].Expert Systems with Applications,2017,87:209-219.
[17] HUANG Z,SHAN G,CHENG J,et al.TRec:An efficientrecommendation system for hunting passengers with deep neural networks[J].Neural Computing and Applications,2019,31(1):209-222.
[18] HE X,HE Z,SONG J,et al.Nais:Neural attentive item simila-rity model for recommendation[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(12):2354-2366.
[19] PAZZANI M J,BILLSUS D.Content-based recommendationsystems[M]//The Adaptive Web.Berlin:Springer Press,2007:325-341.
[20] BREESE J S,HECKERMAN D,KADIE C.Empirical Analysis of Predictive Algorithms for Collaborative Filtering[J].Uncertainty in Artificial Intelligence,2013,98(7):43-52.
[21] LIN W,ALVAREZ S A,RUIZ C.Efficient Adaptive-Support Association Rule Mining for Recommender Systems[J].Data Mining & Knowledge Discovery,2002,6(1):83-105.
[22] YIN Y,FENG D,SHI S.A Utility based personalized article recommendation method[J].Journal of Computer Science,2017,40(12):2797-2811.
[23] VARTAK M,MADDEN S.CHIC:a combination-based recommendation system[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data.2013:981-984.
[24] FU M,QU H,YI Z,et al.A novel deep learning-based collaborative filtering model for recommendation system[J].IEEE transactions on cybernetics,2018,49(3):1084-1096.
[25] LI C,QUAN C,PENG L,et al.A capsule network for recommendation and explaining what you like and dislike[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:275-284.
[26] GABRIEL DE SOUZA P M,JANNACH D,DA CUNHA A M.Contextual hybrid session-based news recommendation with recurrent neural networks[J].IEEE Access,2019,7:169185-169203.
[27] CHEN X,LI S,LI H,et al.Generative adversarial user model for reinforcement learning based recommendation system[C]//International Conference on Machine Learning.PMLR,2019:1052-1061.
[28] XIAO Y,XIAO L,LU X Z,et al.Deep Reinforcement Learning-Based User Profile Perturbation for Privacy Aware Recommendation[J].IEEE Internet of Things Journal,2020,8(6):4560-4568.
[29] ZHANG Y Y,SU X Y,LIU Y.A Novel Movie Recommenda-tion System Based on Deep Reinforcement Learning with Prio-ritized Experience Replay[C]//2019 IEEE 19th International Conference on Communication Technology (ICCT).New York:IEEE,2019:1496-1500.
[30] WATKINS C J C H,DAYAN P.Q-learning[J].Machine lear-ning,1992,8(3/4):279-292.
[31] PETERS J,SCHAAL S.Natural Actor-Critic[J].Neurocompu-ting,2008,71(7/8/9):1180-1190.
[32] WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1995-2003.
[33] FAN J,WANG Z,XIE Y,et al.A theoretical analysis of deep Q-learning[C]//Learning for Dynamics and Control.PMLR,2020:486-489.
[34] XIANG L.Recommended system practice[M].Beijing:Posts & Telecom Press.2012.
[35] HERLOCKER J L,KONSTAN J A,TERVEEN L G,et al.Evaluating collaborative filtering recommender systems[J].ACM Transactions onInformation Systems(TOIS),2004,22(1):5-53.
[36] COLLINS A,TKACZYK D,BEEL J.A Novel Approach toRecommendation Algorithm Selection using Meta-Learning[C]//AICS.2018:210-219.
[37] YANG K X,LI Y W.Development and Design of mobile Intelligent Learning Platform based on Collaborative Filtering Algorithm[J].Software Engineering and Applications,2019,8(3):104-114.
[38] AHARON M,ELAD M,BRUCKSTEIN A.K-SVD:An algo-rithm for designing overcomplete dictionaries for sparse representation[J].IEEE Transactions on Signal Processing,2006,54(11):4311-4322.
[39] KOREN Y.Factorization meets the neighborhood:a multiface-ted collaborative filtering model[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2008:426-434.
[40] WANG X,YANG H,LIM K.Privacy-preserving POI recommendation using nonnegative matrix factorization[C]//2018 IEEE Symposium on Privacy-aware Computing(PAC).New York:IEEE,2018:117-118.
[41] BARRON E N,ISHII H.The Bellman equation for minimizing the maximum cost[J].Nonlinear Analysis:Theory,Methods & Applications,1989,13(9):1067-1090.
[42] AMIT R,MEIR R,CIOSEK K.Discount factor as a regularizer in reinforcement learning[C]//International Conference on Machine Learning.PMLR,2020:269-278.

相关文章 15

[1]	程章桃, 钟婷, 张晟铭, 周帆. 基于图学习的推荐系统研究综述 Survey of Recommender Systems Based on Graph Learning 计算机科学, 2022, 49(9): 1-13. https://doi.org/10.11896/jsjkx.210900072
[2]	王冠宇, 钟婷, 冯宇, 周帆. 基于矢量量化编码的协同过滤推荐方法 Collaborative Filtering Recommendation Method Based on Vector Quantization Coding 计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109
[3]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[4]	秦琪琦, 张月琴, 王润泽, 张泽华. 基于知识图谱的层次粒化推荐方法 Hierarchical Granulation Recommendation Method Based on Knowledge Graph 计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111
[5]	方义秋, 张震坤, 葛君伟. 基于自注意力机制和迁移学习的跨领域推荐算法 Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning 计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011
[6]	帅剑波, 王金策, 黄飞虎, 彭舰. 基于神经架构搜索的点击率预测模型 Click-Through Rate Prediction Model Based on Neural Architecture Search 计算机科学, 2022, 49(7): 10-17. https://doi.org/10.11896/jsjkx.210600009
[7]	齐秀秀, 王佳昊, 李文雄, 周帆. 基于概率元学习的矩阵补全预测融合算法 Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning 计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[8]	于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[9]	李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[10]	蔡晓娟, 谭文安. 一种改进的融合相似度和信任度的协同过滤算法 Improved Collaborative Filtering Algorithm Combining Similarity and Trust 计算机科学, 2022, 49(6A): 238-241. https://doi.org/10.11896/jsjkx.210400088
[11]	何亦琛, 毛宜军, 谢贤芬, 古万荣. 基于点割集图分割的矩阵变换与分解的推荐算法 Matrix Transformation and Factorization Based on Graph Partitioning by Vertex Separator for Recommendation 计算机科学, 2022, 49(6A): 272-279. https://doi.org/10.11896/jsjkx.210600159
[12]	谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[13]	郭亮, 杨兴耀, 于炯, 韩晨, 黄仲浩. 基于注意力机制和门控网络相结合的混合推荐系统 Hybrid Recommender System Based on Attention Mechanisms and Gating Network 计算机科学, 2022, 49(6): 158-164. https://doi.org/10.11896/jsjkx.210500013
[14]	熊中敏, 舒贵文, 郭怀宇. 融合用户偏好的图神经网络推荐模型 Graph Neural Network Recommendation Model Integrating User Preferences 计算机科学, 2022, 49(6): 165-171. https://doi.org/10.11896/jsjkx.210400276
[15]	余皑欣, 冯秀芳, 孙静宇. 结合物品相似性的社交信任推荐算法 Social Trust Recommendation Algorithm Combining Item Similarity 计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed