计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 595-598.
刘华玲1, 林蓓1, 恽文婧1, 丁宇杰2
LIU Hua-ling1, LIN Bei1, YUN Wen-jing1, DING Yu-jie2
摘要: 互联网金融的快速发展,使得P2P成为一种创新的金融模式,如何识别出网贷中的潜在风险成为研究热点。网贷交易数据常常存在严重的不平衡,导致风险识别率较低。针对这一问题,文中采用随机下采样、SMOTE和Bagging方法进行类平衡处理,利用逻辑回归和支持向量分类机进行检验评价。实验表明,在P2P风险识别中,以召回率为标准,bagging的平衡处理效果优于随机下采样与SMOTE,且逻辑回归不存在明显的过拟合,所以其他SVC更适合用于P2P逾期风险识别。
中图分类号:
[1]KLAFFT M.Peer to Peer Lending:Auctioning Microcredits over the Internet[M].Social Science Electronic Publishing,2009. [2]PURO L,TEICH J E,WALLENIUS H,et al.Borrower Deci-sion Aid for people-to-people lending[J].Decision Support Systems,2010,49(1):52-60. [3]DUARTE J,SIEGEL S,YOUNG L.Trust and Credit:The Role of Appearance in Peer-to-peer Lending[J].Review of Financial Studies,2012,25(8):2455-2483. [4]EMEKTER R,TU Y,JIRASAKULDECH B,et al.Evaluatingcredit risk and loan performance in online Peer-to-Peer (P2P) lending[J].Applied Economics,2015,47(1):54-70. [5]GUO Y,ZHOU W,LUO C,et al.Instance-based credit risk assessment for investment decisions in P2P lending[J].European Journal of Operational Research,2015,249(2):417-426. [6]柳向东,李凤.大数据背景下网络借贷的信用风险评估——以人人贷为例[J].统计与信息论坛,2016,31(5):41-48. [7]罗钦芳,丁国维,傅馨,等.基于“多层次分类”方法的异常P2P网贷借款识别[J].管理工程学报,2017,31(3):201-209. [8]XIA Y,LIU C,LIU N.Cost-sensitive boosted tree for loan eva-luation in peer-to-peer lending[J].Electronic Commerce Research & Applications,2017,24:30-49. [9]HE H,GARCIA E A.Learning from Imbalanced Data[J].IEEE Transactions on Knowledge & Data Engineering,2009,21(9):1263-1284. [10]HULSE J V,KHOSHGOFTAAR T M,NAPOLITANO A,et al.An exploration of learning when data is noisy and imba-lanced[J].Intelligent Data Analysis,2011,15(2):215-236. [11]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357. [12]BREIMAN L.Bagging predictors[J].Machine Learning,1996,24(2):123-140. [13]刘巧莉,温浩宇,Hong Qin.P2P网络信贷中投资行为影响因素研究——基于拍拍贷平台交易的证据[J].管理评论,2017,29(6):13-22. [14]陈冬宇,朱浩,郑海超.风险、信任和出借意愿———基于拍拍贷注册用户的实证研究[J].管理评论,2014,26(1):150-158. [15]廖理,吉霖,张伟强.借贷市场能准确识别学历的价值吗?——来自P2P平台的经验证据[J].金融研究,2015(3):146-159. [16]曾江洪,李文瀚,陈玺.P2P借款的损失能挽回吗?——基于拍拍贷的实证研究[J].科研管理,2016,37(8):48-57. [17]彭红枫,杨柳明,谭小玉.地域差异如何影响P2P平台借贷的行为——基于“人人贷”的经验证据[J].当代经济科学,2016,38(5):21-34. [18]胡晏.信用等级、借款成功率与违约风险——基于“拍拍贷”数据的经验证据[J].投资研究,2017,36(8):143-158. [19]WEISS G M,PROVOST F.Learning when training data arecostly:the effect of class distribution on tree induction[M].AI Access Foundation,2003. [20]魏瑾瑞,吕晓云.Logistic模型对非平衡数据的敏感性:测度、修正与比较[J].统计研究,2016,33(2):79-85. |
[1] | 林夕, 陈孜卓, 王中卿. 基于不平衡数据与集成学习的属性级情感分类 Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning 计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205 |
[2] | 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065 |
[3] | 朱旭东, 熊贇. 基于样本分布损失的图像多标签分类研究 Study on Multi-label Image Classification Based on Sample Distribution Loss 计算机科学, 2022, 49(6): 210-216. https://doi.org/10.11896/jsjkx.210300267 |
[4] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[5] | 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳. 基于共同子空间分类学习的跨媒体检索研究 Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning 计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157 |
[6] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[7] | 陈伟, 李杭, 李维华. 核小体定位预测的集成学习方法 Ensemble Learning Method for Nucleosome Localization Prediction 计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195 |
[8] | 刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189 |
[9] | 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究 Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method 计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220 |
[10] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[11] | 戴宗明, 胡凯, 谢捷, 郭亚. 基于直觉模糊集的集成学习算法 Ensemble Learning Algorithm Based on Intuitionistic Fuzzy Sets 计算机科学, 2021, 48(6A): 270-274. https://doi.org/10.11896/jsjkx.200700036 |
[12] | 郇文明, 林海涛. 基于采样集成算法的入侵检测系统设计 Design of Intrusion Detection System Based on Sampling Ensemble Algorithm 计算机科学, 2021, 48(11A): 705-712. https://doi.org/10.11896/jsjkx.201100101 |
[13] | 刘振鹏, 苏楠, 秦益文, 卢家欢, 李小菲. FS-CRF:基于特征切分与级联随机森林的异常点检测模型 FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest 计算机科学, 2020, 47(8): 185-188. https://doi.org/10.11896/jsjkx.190600162 |
[14] | 董明刚,姜振龙,敬超. 基于海林格距离和SMOTE的多类不平衡学习算法 Multi-class Imbalanced Learning Algorithm Based on Hellinger Distance and SMOTE Algorithm 计算机科学, 2020, 47(1): 102-109. https://doi.org/10.11896/jsjkx.190600060 |
[15] | 钟熙, 孙祥娥. 基于Kmeans++聚类的朴素贝叶斯集成方法研究 Research on Naive Bayes Ensemble Method Based on Kmeans++ Clustering 计算机科学, 2019, 46(6A): 439-441. |
|