Credit scoring is a standard method of assessing the creditworthiness of the loan applicants in the banking industry. Over time, objective quantitative tools have been developed and adopted to suppress the subjective ad-hoc elements in the decision process, strengthen the accountability and substitutability and reduce the chances for corruption.
Various statistical and computer science methods can be used to build an objective model to differentiate the loan applicants and to estimate the probability of their default. Support Vector Machines derived and proved by Vapnik is an example of such a method.
One of the principal goal of my thesis is to examine the performance of support vector machines (SVM) in credit scoring and to compare them with the logistic regression (LR) which still remains the industry standard in banking. In the first chapter, I will show that SVM and LR share some common properties and that they both belong to a wide family of linear classifiers.
The second chapter is dedicated to the mathematical derivation of the support vector machines as it was developed throughout 30 years of their evolution: from the the hard maximum-margin classifier, which works only for the linearly separable data, to the current form using the soft margin and the kernel trick to cover non-linear and noisy data.
The third chapter is focused on the practical problems of the credit scoring models development: how to measure the performance, how to prepare the data set, how to evaluate different models and choose the best one among all possibilities.
The fourth chapter summarizes the current state of knowledge in the credit scoring, based on the peer-reviewed literature with a special focus on the support vector machines.
The banking industry is, due to the understandable carefulness and the model risk, reluctant to adopt new approaches in the credit scoring, unless it introduced a major performance advantage. Therefore, I concentrate on the application of SVMs in the newly emerging sector of the loan business: the peer-to-peer lending, which as a new phenomenon is described in detail the fifth chapter.
The P2P lending is characteristic for the prevailing presence of the amateur investors, whose strategy is based either on a blind diversification or on a naïve and rather empirical credit scoring that embodies one or two factors. Credit scoring based on the objective and quantitative methods could under such conditions lead to the excess returns when applied by a concerned investor.
The last chapter describes the procedures and methods used to build a support vector machine for credit scoring on a real data set obtained from one of the leading P2P platforms in the Europe. Performance of the model is discussed as well as the comparison with the industry-standard logistic regression.
As the chosen P2P platform is open for small investors from all over the Europe, my findings could be directly applied in practice by anyone willing to build an investment strategy on the top of them. This thesis is, however, written solely for the educational purposes and such proceeding is therefore not recommended.