Logistic regression: dummy variables
Variable | DF | Wald | p | Gini |
Credit score | 6 | 108.8197 | 0.0000 | 0.3205 |
Interest | 1 | 57.1851 | 0.0000 | 0.4137 |
PreviousRepayments | 1 | 24.3019 | 0.0000 | 0.1604 |
Age | 1 | 23.3031 | 0.0000 | 0.1534 |
AmountOfPreviousLoans | 1 | 22.3896 | 0.0000 | 0.0207 |
Marital status | 4 | 27.6078 | 0.0000 | 0.0941 |
VerificationType | 2 | 20.4638 | 0.0000 | 0.2248 |
NewLoanMonthlyPayment | 1 | 15.5639 | 0.0001 | 0.2772 |
NewPaymentToIncome | 1 | 11.4037 | 0.0007 | 0.2615 |
nr\_of\_dependants | 1 | 8.7477 | 0.0031 | 0.0058 |
UseOfLoan | 8 | 21.1369 | 0.0068 | 0.0658 |
Occupation | 19 | 36.4364 | 0.0093 | 0.1249 |
Employment | 5 | 14.5231 | 0.0126 | 0.1214 |
AppliedAmount | 1 | 5.7889 | 0.0161 | 0.1952 |
ApplicationType | 1 | 4.2583 | 0.0391 | 0.0773 |
Table 9. The final logistic regression model built from 15 explanatory variables. Categorical variables were encoded using the dummy variables. The Wald statistics and the p-value of the total effect of the variable are summarized in 3rd and 4th columns. The last column contains the Gini coefficient for each variable from the univariate analysis.
Variable | DF | Estimate | SE | Wald | p |
Intercept | 1 | -0.6409 | 0.7997 | 0.6423 | 0.4229 |
Interest | 1 | 0.4643 | 0.0614 | 57.1851 | 0.0000 |
PreviousRepayments | 1 | -0.9138 | 0.1854 | 24.3019 | 0.0000 |
Age | 1 | -0.3274 | 0.0678 | 23.3031 | 0.0000 |
AmountOfPreviousLoans | 1 | 0.6262 | 0.1323 | 22.3896 | 0.0000 |
NewLoanMonthlyPayment | 1 | -0.4786 | 0.1213 | 15.5639 | 0.0001 |
AppliedAmount | 1 | 0.2147 | 0.0893 | 5.7889 | 0.0161 |
nr_of_dependants | 1 | -0.1837 | 0.0621 | 8.7477 | 0.0031 |
NewPaymentToIncome | 1 | 0.3794 | 0.1123 | 11.4037 | 0.0007 |
Credit score = 1000 | 1 | -0.4408 | 0.2070 | 4.5362 | 0.0332 |
Credit score = 800 | 1 | -0.1349 | 0.2813 | 0.2299 | 0.6316 |
Credit score = 700 | 1 | 0.6146 | 0.2740 | 5.0317 | 0.0249 |
Credit score = 600 | 1 | -0.0502 | 0.2588 | 0.0376 | 0.8462 |
Credit score = 500 | 1 | 0.9316 | 0.2273 | 16.7974 | 0.0000 |
Credit score = empty | 1 | 0.1902 | 0.7848 | 0.0588 | 0.8085 |
Marital status = Married | 1 | -1.4145 | 0.3538 | 15.9883 | 0.0001 |
Marital status = Cohabitant | 1 | -1.8084 | 0.3672 | 24.2555 | 0.0000 |
Marital status = Single | 1 | -1.5172 | 0.3744 | 16.4220 | 0.0001 |
Marital status = Divorced | 1 | -1.4623 | 0.3785 | 14.9280 | 0.0001 |
Occupation = empty | 1 | -1.0717 | 0.4980 | 4.6318 | 0.0314 |
Occupation = Other | 1 | 0.0767 | 0.2483 | 0.0954 | 0.7574 |
Occupation = Telecom | 1 | -0.1106 | 0.2926 | 0.1430 | 0.7053 |
Occupation = Finance | 1 | 0.3766 | 0.3166 | 1.4142 | 0.2344 |
Occupation = Real-estate | 1 | 0.6374 | 0.6939 | 0.8437 | 0.3583 |
Occupation = Research | 1 | -0.7243 | 0.6831 | 1.1243 | 0.2890 |
Occupation = Administrative | 1 | 0.3894 | 0.5114 | 0.5798 | 0.4464 |
Occupation = Civil service & military | 1 | 0.7060 | 0.2994 | 5.5589 | 0.0184 |
Occupation = Education | 1 | -0.2217 | 0.3096 | 0.5127 | 0.4740 |
Occupation = Healthcare | 1 | 0.5409 | 0.3292 | 2.6990 | 0.1004 |
Occupation = Art/entertainment | 1 | 0.3496 | 0.4096 | 0.7285 | 0.3934 |
Occupation = Agriculture | 1 | 0.5361 | 0.3406 | 2.4771 | 0.1155 |
Occupation = Mining | 1 | 2.3646 | 1.2185 | 3.7660 | 0.0523 |
Occupation = Processing | 1 | 0.0232 | 0.2770 | 0.0070 | 0.9331 |
Occupation = Energy | 1 | -0.3527 | 0.4661 | 0.5726 | 0.4492 |
Occupation = Utilities | 1 | -0.3805 | 0.8587 | 0.1964 | 0.6577 |
Occupation = Construction | 1 | 0.1950 | 0.2997 | 0.4234 | 0.5153 |
Occupation = Retail/wholesale | 1 | -0.0209 | 0.2822 | 0.0055 | 0.9410 |
Occupation = Transport | 1 | 0.1994 | 0.3157 | 0.3990 | 0.5276 |
VerificationType = Phone | 1 | 1.0354 | 0.2337 | 19.6264 | 0.0000 |
VerificationType = Income verified | 1 | 0.8578 | 0.2434 | 12.4161 | 0.0004 |
ApplicationType = Timed funding | 1 | 0.2747 | 0.1331 | 4.2583 | 0.0391 |
UseOfLoan = Loan consolidation | 1 | 0.3533 | 0.2468 | 2.0488 | 0.1523 |
UseOfLoan = Real estate | 1 | 0.0374 | 0.3792 | 0.0097 | 0.9215 |
UseOfLoan = Home improvement | 1 | 0.6109 | 0.2430 | 6.3216 | 0.0119 |
UseOfLoan = Business | 1 | 0.5974 | 0.3791 | 2.4837 | 0.1150 |
UseOfLoan = Education | 1 | 0.9645 | 0.3086 | 9.7692 | 0.0018 |
UseOfLoan = Travel | 1 | 0.1772 | 0.3745 | 0.2240 | 0.6360 |
UseOfLoan = Vehicle | 1 | 0.6574 | 0.2500 | 6.9123 | 0.0086 |
UseOfLoan = Other | 1 | 0.2459 | 0.2291 | 1.1520 | 0.2831 |
Employment = empty | 1 | -0.4164 | 0.6641 | 0.3932 | 0.5306 |
Employment = Partially employed | 1 | -0.4944 | 0.6663 | 0.5506 | 0.4581 |
Employment = Fully employed | 1 | -0.5997 | 0.6310 | 0.9034 | 0.3419 |
Employment = Self-employed | 1 | -0.8576 | 0.7325 | 1.3708 | 0.2417 |
Employment = Entrepreneur | 1 | -1.7734 | 0.7127 | 6.1908 | 0.0128 |
Table 10. Maximum likelihood estimates for the model from Table 9. Missing values of the categorical variables are: Credit score=900, Marital status=Widowed, Occupation=Hospitality and catering, Verification type=Income and expenses verified, Application type=Quick funding, UseOfLoan=Health, Employment=Retiree – the estimates for these values can be derived from the estimates stated in the table.
Logistic regression: woeised categorical variables
Variable | DF | Wald | p | Gini |
Credit score | 1 | 100.9309 | 0.0000 | 0.3431 |
Interest | 1 | 49.7779 | 0.0000 | 0.4137 |
Home ownership | 1 | 31.2436 | 0.0000 | 0.2412 |
ApplicationSignedHour | 1 | 27.1686 | 0.0000 | 0.0830 |
PreviousRepayments | 1 | 23.4446 | 0.0000 | 0.1604 |
Language code | 1 | 22.5242 | 0.0000 | 0.0942 |
AmountOfPreviousLoans | 1 | 21.7680 | 0.0000 | 0.0207 |
Marital status | 1 | 16.6359 | 0.0000 | 0.0941 |
Age | 1 | 12.2566 | 0.0005 | 0.1534 |
Employment status | 1 | 10.8514 | 0.0010 | 0.1214 |
Occupation | 1 | 10.3915 | 0.0013 | 0.2022 |
NewLoanMonthlyPayment | 1 | 8.0873 | 0.0045 | 0.2772 |
NewPaymentToIncome | 1 | 7.1446 | 0.0075 | 0.2615 |
AppliedAmount | 1 | 4.2468 | 0.0393 | 0.1952 |
ApplicationSignedWeekday | 1 | 4.0817 | 0.0434 | 0.0732 |
Table 11. The final logistic regression model built from 15 explanatory variables. Categorical variables were transformed to the real-valued variables using the Weight of Evidence calculation.
Variable | DF | Estimate | SE |
Intercept | 1 | -1.6810 | 0.0628 |
Interest | 1 | 0.3915 | 0.0555 |
PreviousRepayments | 1 | -0.8528 | 0.1761 |
AmountOfPreviousLoans | 1 | 0.5789 | 0.1241 |
Age | 1 | -0.1808 | 0.0516 |
NewPaymentToIncome | 1 | 0.2598 | 0.0972 |
NewLoanMonthlyPayment | 1 | -0.3149 | 0.1107 |
AppliedAmount | 1 | 0.1689 | 0.0820 |
Credit score | 1 | -0.4709 | 0.0469 |
ApplicationSignedHour | 1 | -0.5922 | 0.1136 |
Home ownership | 1 | -0.5077 | 0.0908 |
Language code | 1 | -0.7404 | 0.1560 |
Marital status | 1 | -0.2080 | 0.0510 |
Occupation | 1 | -0.1656 | 0.0514 |
Employment | 1 | -0.1916 | 0.0582 |
ApplicationSignedWeekday | 1 | -0.1017 | 0.0504 |
Table 12. Maximum likeliehood estimates of the model from Table 11.