Appendix A

Logistic regression: dummy variables

Variable DF Wald \chi^2 p Gini
Credit score 6 108.8197 0.0000 0.3205
Interest 1 57.1851 0.0000 0.4137
PreviousRepayments 1 24.3019 0.0000 0.1604
Age 1 23.3031 0.0000 0.1534
AmountOfPreviousLoans 1 22.3896 0.0000 0.0207
Marital status 4 27.6078 0.0000 0.0941
VerificationType 2 20.4638 0.0000 0.2248
NewLoanMonthlyPayment 1 15.5639 0.0001 0.2772
NewPaymentToIncome 1 11.4037 0.0007 0.2615
nr\_of\_dependants 1 8.7477 0.0031 0.0058
UseOfLoan 8 21.1369 0.0068 0.0658
Occupation 19 36.4364 0.0093 0.1249
Employment 5 14.5231 0.0126 0.1214
AppliedAmount 1 5.7889 0.0161 0.1952
ApplicationType 1 4.2583 0.0391 0.0773

Table 9. The final logistic regression model built from 15 explanatory variables. Categorical variables were encoded using the dummy variables. The Wald \chi^2 statistics and the p-value of the total effect of the variable are summarized in 3rd and 4th columns. The last column contains the Gini coefficient for each variable from the univariate analysis.

 

Variable DF Estimate SE Wald \chi^2 p
Intercept 1 -0.6409 0.7997 0.6423 0.4229
Interest 1 0.4643 0.0614 57.1851 0.0000
PreviousRepayments 1 -0.9138 0.1854 24.3019 0.0000
Age 1 -0.3274 0.0678 23.3031 0.0000
AmountOfPreviousLoans 1 0.6262 0.1323 22.3896 0.0000
NewLoanMonthlyPayment 1 -0.4786 0.1213 15.5639 0.0001
AppliedAmount 1 0.2147 0.0893 5.7889 0.0161
nr_of_dependants 1 -0.1837 0.0621 8.7477 0.0031
NewPaymentToIncome 1 0.3794 0.1123 11.4037 0.0007
Credit score = 1000 1 -0.4408 0.2070 4.5362 0.0332
Credit score = 800 1 -0.1349 0.2813 0.2299 0.6316
Credit score = 700 1 0.6146 0.2740 5.0317 0.0249
Credit score = 600 1 -0.0502 0.2588 0.0376 0.8462
Credit score = 500 1 0.9316 0.2273 16.7974 0.0000
Credit score = empty 1 0.1902 0.7848 0.0588 0.8085
Marital status = Married 1 -1.4145 0.3538 15.9883 0.0001
Marital status = Cohabitant 1 -1.8084 0.3672 24.2555 0.0000
Marital status = Single 1 -1.5172 0.3744 16.4220 0.0001
Marital status = Divorced 1 -1.4623 0.3785 14.9280 0.0001
Occupation = empty 1 -1.0717 0.4980 4.6318 0.0314
Occupation = Other 1 0.0767 0.2483 0.0954 0.7574
Occupation = Telecom 1 -0.1106 0.2926 0.1430 0.7053
Occupation = Finance 1 0.3766 0.3166 1.4142 0.2344
Occupation = Real-estate 1 0.6374 0.6939 0.8437 0.3583
Occupation = Research 1 -0.7243 0.6831 1.1243 0.2890
Occupation = Administrative 1 0.3894 0.5114 0.5798 0.4464
Occupation = Civil service & military 1 0.7060 0.2994 5.5589 0.0184
Occupation = Education 1 -0.2217 0.3096 0.5127 0.4740
Occupation = Healthcare 1 0.5409 0.3292 2.6990 0.1004
Occupation = Art/entertainment 1 0.3496 0.4096 0.7285 0.3934
Occupation = Agriculture 1 0.5361 0.3406 2.4771 0.1155
Occupation = Mining 1 2.3646 1.2185 3.7660 0.0523
Occupation = Processing 1 0.0232 0.2770 0.0070 0.9331
Occupation = Energy 1 -0.3527 0.4661 0.5726 0.4492
Occupation = Utilities 1 -0.3805 0.8587 0.1964 0.6577
Occupation = Construction 1 0.1950 0.2997 0.4234 0.5153
Occupation = Retail/wholesale 1 -0.0209 0.2822 0.0055 0.9410
Occupation = Transport 1 0.1994 0.3157 0.3990 0.5276
VerificationType = Phone 1 1.0354 0.2337 19.6264 0.0000
VerificationType = Income verified 1 0.8578 0.2434 12.4161 0.0004
ApplicationType = Timed funding 1 0.2747 0.1331 4.2583 0.0391
UseOfLoan = Loan consolidation 1 0.3533 0.2468 2.0488 0.1523
UseOfLoan = Real estate 1 0.0374 0.3792 0.0097 0.9215
UseOfLoan = Home improvement 1 0.6109 0.2430 6.3216 0.0119
UseOfLoan = Business 1 0.5974 0.3791 2.4837 0.1150
UseOfLoan = Education 1 0.9645 0.3086 9.7692 0.0018
UseOfLoan = Travel 1 0.1772 0.3745 0.2240 0.6360
UseOfLoan = Vehicle 1 0.6574 0.2500 6.9123 0.0086
UseOfLoan = Other 1 0.2459 0.2291 1.1520 0.2831
Employment = empty 1 -0.4164 0.6641 0.3932 0.5306
Employment = Partially employed 1 -0.4944 0.6663 0.5506 0.4581
Employment = Fully employed 1 -0.5997 0.6310 0.9034 0.3419
Employment = Self-employed 1 -0.8576 0.7325 1.3708 0.2417
Employment = Entrepreneur 1 -1.7734 0.7127 6.1908 0.0128

Table 10. Maximum likelihood estimates for the model from Table 9. Missing values of the categorical variables are: Credit score=900, Marital status=Widowed, Occupation=Hospitality and catering, Verification type=Income and expenses verified, Application type=Quick funding, UseOfLoan=Health, Employment=Retiree – the estimates for these values can be derived from the estimates stated in the table.

Logistic regression: woeised categorical variables

Variable DF Wald \chi^2 p Gini
Credit score 1 100.9309 0.0000 0.3431
Interest 1 49.7779 0.0000 0.4137
Home ownership 1 31.2436 0.0000 0.2412
ApplicationSignedHour 1 27.1686 0.0000 0.0830
PreviousRepayments 1 23.4446 0.0000 0.1604
Language code 1 22.5242 0.0000 0.0942
AmountOfPreviousLoans 1 21.7680 0.0000 0.0207
Marital status 1 16.6359 0.0000 0.0941
Age 1 12.2566 0.0005 0.1534
Employment status 1 10.8514 0.0010 0.1214
Occupation 1 10.3915 0.0013 0.2022
NewLoanMonthlyPayment 1 8.0873 0.0045 0.2772
NewPaymentToIncome 1 7.1446 0.0075 0.2615
AppliedAmount 1 4.2468 0.0393 0.1952
ApplicationSignedWeekday 1 4.0817 0.0434 0.0732

Table 11. The final logistic regression model built from 15 explanatory variables. Categorical variables were transformed to the real-valued variables using the Weight of Evidence calculation.

Variable DF Estimate SE
Intercept 1 -1.6810 0.0628
Interest 1 0.3915 0.0555
PreviousRepayments 1 -0.8528 0.1761
AmountOfPreviousLoans 1 0.5789 0.1241
Age 1 -0.1808 0.0516
NewPaymentToIncome 1 0.2598 0.0972
NewLoanMonthlyPayment 1 -0.3149 0.1107
AppliedAmount 1 0.1689 0.0820
Credit score 1 -0.4709 0.0469
ApplicationSignedHour 1 -0.5922 0.1136
Home ownership 1 -0.5077 0.0908
Language code 1 -0.7404 0.1560
Marital status 1 -0.2080 0.0510
Occupation 1 -0.1656 0.0514
Employment 1 -0.1916 0.0582
ApplicationSignedWeekday 1 -0.1017 0.0504

Table 12. Maximum likeliehood estimates of the model from Table 11.

Leave a Reply

Your email address will not be published. Required fields are marked *