Conclusion – Support Vector Machines for Credit Scoring

With the advance of new technologies and investment possibilities, the statistical or machine learning methods, once reserved exclusively to the professional financial institutions, can be also beneficial to the amateur investors.

The method of support vector machines as an alternative to the conservative logistic regression models was studied and its performance compared on the real credit data sets. Especially in combination with the non-linear kernel, SVM proved itself as a competitive approach and provided a slight edge on top of the logistic regression model.

The cost for this is much higher computational time, which was needed for the finding of the optimal parameters of the kernel function in particular. The process of model development was time consuming, as well. Partly because of the necessity to study the subject thoroughly, since SVM is not as notorious method as LR. Partly because of lower support of support vector machines in the environment I chose for my models’ development.

The extra performance brought by the support vector machines can not be considered as an argument for replacing the well established logistic regression. The professional institutions are bound by the strict regulatory rules and the extra performance is not high enough to outweigh the potential model risk: there is simply not enough incentive for regulators and traditional institutions to replace a model that worked so well for decades.

Since the amateur investors and private funds are not bound by the regulatory rules, my results could indicate that SVM may be an interesting alternative for them. Nevertheless, as the background research from the cited scientific papers shows, support vector machines’ performance tend to be less stable and reliable in general. While it performs well on some data sets (as in my case), it gets beaten by the logistic regression on average. [42]

There are other arguments against the real-life application of the support vector machines in credit scoring. The massively cited comparison paper of Baesens et al. from 2013 clearly shows that individual classifiers (where LR along with SVM belong to) have reached their performance limits years ago and are no longer at the center of attention of the researchers. Baesens’ team goes even further, when they recommend to stop using logistic regression as an etalon in the future scientific research and replace it by the new generation of algorithms (namely Random Forests).

One can therefore argue, that private subjects will tend to adopt the current state-of-the art classifications algorithms. Some of them were briefly discussed in the Chapter 4 as the potentially interesting topics for further research.

Other, more serious argument against SVM, comes out from the fact that it is very hard to use it as a standalone method. Despite a serious effort, I was not able to implement a reliable feature selection process according to the techniques recommended in the literature (which were summarized in Section Feature selection). Eventually, I had to divert to the alternative approach, using the rigorous statistical tests in connection with the logistic regression, to select the most statistically significant explanatory variables to build the model. More importantly, this is the only method explicitly mentioned in the literature available to me, to quantify the effect of each variable in support vector machines model. This disadvantage is slightly balanced by the fact, that SVM tend to perform better in the form of the unrestricted model.

Despite all the facts against it, support vector machines remain an important concept from the educational and theoretical point of view. They also formed a history of machine learning, as it was the first method which was able to compete with human in the recognition of the handwritten numbers and they inspired many subsequent research. Their use in credit scoring specifically is not, however, without problems and cannot be recommended in real applications, unless another major breakthrough further increases its performance or reliability.

Leave a Reply Cancel reply