Indian Journal of Science and Technology
DOI: 10.17485/ijst/2016/v9i20/92299
Year: 2016, Volume: 9, Issue: 20, Pages: 1-6
Original Article
Ha Van Sang1* , Nguyen Ha Nam2 , Nguyen Duc Nhan3
1Department of Economic Information System, Academy of Finance, Hanoi, Viet Nam; [email protected] 2Department of Information Technology, VNU-University of Engineering and Technology, Hanoi, Viet Nam; [email protected] 3Department, Faculty of Telecommunications, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam; [email protected]
*Author for correspondence
Ha Van Sang
Department of Economic Information System, Academy of Finance, Hanoi, Viet Nam;
Email: [email protected]
Background/Objectives: This article presents a method of feature selection to improve the accuracy and the computation speed of credit scoring models. Methods/Analysis: In this paper, we proposed a credit scoring model based on parallel Random Forest classifier and feature selection method to evaluate the credit risks of applicants. By integration of Random Forest into feature selection process, the importance of features can be accurately evaluated to remove irrelevant and redundant features. Findings: In this research, an algorithm to select best features was developed by using the best average and median scores and the lowest standard deviation as the rules of feature scoring. Consequently, the dimension of features can be reduced to the smallest possible number that allows of a remarkable runtime reduction. Thus the proposed model can perform feature selection and model parameters optimization at the same time to improve its efficiency. The performance of our proposed model was experimentally assessed using two public datasets which are Australian and German datasets. The obtained results showed that an improved accuracy of the proposed model compared to other commonly used feature selection methods. In particular, our method can attain the average accuracy of 76.2% with a significantly reduced running time of 72 minutes on German credit dataset and the highest average accuracy of 89.4% with the running time of only 50 minutes on Australian credit dataset. Applications/Improvements: This method can be usefully applied in credit scoring models to improve accuracy with a significantly reduced runtime.
Keywords: Credit Scoring, Feature Selection, Machine Learning, and Parallel Random Forest
Subscribe now for latest articles and news.