Decisions to extend credit to potential customers are complex, risky and even potentially catastrophic for the credit granting institution and the broader economy as underscored by credit failures in the late 2000s. Thus, the ability to accurately assess the likelihood of default is an important issue. In this paper the authors contrast the classification accuracy of multiple computational intelligence methods using five datasets obtained from five different decision contexts in the real world. The methods considered are: logistic regression (LR), neural network (NN), radial basis function neural network (RBFNN), support vector machine (SVM), k-nearest neighbor (kNN), and decision tree (DT). The datasets have various characteristics with respect to the number of cases, the number and type of attributes, the extent of missing values as well as different ratios for bad loans/good loans. Using areas under ROC charts as well as the classification accuracy rates for overall, bad loans, and good loans the performances of six methods across five datasets and the five datasets across the methods are examined to find if there are significant differences between the methods and datasets. Our results reveal some interesting findings which may be useful to practitioners. Even though no method consistently outperformed any other method using the above metrics on all datasets, this study provides some guidelines as to the most appropriate methods suitable for each specific data set. In addition, the study finds that customer financial attributes are much more relevant than the personal, social, or employment attributes for predictive accuracy.
Zurada, Jozef; Kunene, Niki; and Guan, Jian
"The Classification Performance of Multiple Methods and Datasets: Cases from the Loan Credit Scoring Domain,"
Journal of International Technology and Information Management: Vol. 23
, Article 5.
Available at: http://scholarworks.lib.csusb.edu/jitim/vol23/iss1/5