In this project we utilize Decision Trees and Random Forest to evaluate ML performances.
https://www.lendingclub.com/info/download-data.action) to classify and predict whether or not the borrower paid back their loan in full.
Data: Lending data from 2007-2010 (** Data head **
credit.policy | purpose | int.rate | installment | log.annual.inc | dti | fico | days.with.cr.line | revol.bal | revol.util | inq.last.6mths | delinq.2yrs | pub.rec | not.fully.paid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | debt_consolidation | 0.1189 | 829.10 | 11.350407 | 19.48 | 737 | 5639.958333 | 28854 | 52.1 | 0 | 0 | 0 | 0 |
1 | 1 | credit_card | 0.1071 | 228.22 | 11.082143 | 14.29 | 707 | 2760.000000 | 33623 | 76.7 | 0 | 0 | 0 | 0 |
2 | 1 | debt_consolidation | 0.1357 | 366.86 | 10.373491 | 11.63 | 682 | 4710.000000 | 3511 | 25.6 | 1 | 0 | 0 | 0 |
3 | 1 | debt_consolidation | 0.1008 | 162.34 | 11.350407 | 8.10 | 712 | 2699.958333 | 33667 | 73.2 | 1 | 0 | 0 | 0 |
4 | 1 | credit_card | 0.1426 | 102.92 | 11.299732 | 14.97 | 667 | 4066.000000 | 4740 | 39.5 | 0 | 1 | 0 | 0 |
5 | 1 | credit_card | 0.0788 | 125.13 | 11.904968 | 16.98 | 727 | 6120.041667 | 50807 | 51.0 | 0 | 0 | 0 | 0 |
6 | 1 | debt_consolidation | 0.1496 | 194.02 | 10.714418 | 4.00 | 667 | 3180.041667 | 3839 | 76.8 | 0 | 0 | 1 | 1 |
7 | 1 | all_other | 0.1114 | 131.22 | 11.002100 | 11.08 | 722 | 5116.000000 | 24220 | 68.6 | 0 | 0 | 0 | 1 |
8 | 1 | home_improvement | 0.1134 | 87.19 | 11.407565 | 17.25 | 682 | 3989.000000 | 69909 | 51.1 | 1 | 0 | 0 | 0 |
9 | 1 | debt_consolidation | 0.1221 | 84.12 | 10.203592 | 10.00 | 707 | 2730.041667 | 5630 | 23.0 | 1 | 0 | 0 | 0 |
** Data crude metrics **
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9578 entries, 0 to 9577
Data columns (total 14 columns):
credit.policy 9578 non-null int64
purpose 9578 non-null object
int.rate 9578 non-null float64
installment 9578 non-null float64
log.annual.inc 9578 non-null float64
dti 9578 non-null float64
fico 9578 non-null int64
days.with.cr.line 9578 non-null float64
revol.bal 9578 non-null int64
revol.util 9578 non-null float64
inq.last.6mths 9578 non-null int64
delinq.2yrs 9578 non-null int64
pub.rec 9578 non-null int64
not.fully.paid 9578 non-null int64
dtypes: float64(6), int64(7), object(1)
memory usage: 1.0+ MB
credit.policy | int.rate | installment | log.annual.inc | dti | fico | days.with.cr.line | revol.bal | revol.util | inq.last.6mths | delinq.2yrs | pub.rec | not.fully.paid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 9578.000000 | 9578.000000 | 9578.000000 | 9578.000000 | 9578.000000 | 9578.000000 | 9578.000000 | 9.578000e+03 | 9578.000000 | 9578.000000 | 9578.000000 | 9578.000000 | 9578.000000 |
mean | 0.804970 | 0.122640 | 319.089413 | 10.932117 | 12.606679 | 710.846314 | 4560.767197 | 1.691396e+04 | 46.799236 | 1.577469 | 0.163708 | 0.062122 | 0.160054 |
std | 0.396245 | 0.026847 | 207.071301 | 0.614813 | 6.883970 | 37.970537 | 2496.930377 | 3.375619e+04 | 29.014417 | 2.200245 | 0.546215 | 0.262126 | 0.366676 |
min | 0.000000 | 0.060000 | 15.670000 | 7.547502 | 0.000000 | 612.000000 | 178.958333 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 1.000000 | 0.103900 | 163.770000 | 10.558414 | 7.212500 | 682.000000 | 2820.000000 | 3.187000e+03 | 22.600000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
50% | 1.000000 | 0.122100 | 268.950000 | 10.928884 | 12.665000 | 707.000000 | 4139.958333 | 8.596000e+03 | 46.300000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 |
75% | 1.000000 | 0.140700 | 432.762500 | 11.291293 | 17.950000 | 737.000000 | 5730.000000 | 1.824950e+04 | 70.900000 | 2.000000 | 0.000000 | 0.000000 | 0.000000 |
max | 1.000000 | 0.216400 | 940.140000 | 14.528354 | 29.960000 | 827.000000 | 17639.958330 | 1.207359e+06 | 119.000000 | 33.000000 | 13.000000 | 5.000000 | 1.000000 |
Data Exploration
** Histogram of two FICO distributions on top of each other, one for each credit.policy outcome.**
Text(0.5,0,'fico')
** Similar figure, except this time select by the not.fully.paid column.**
Text(0.5,0,'fico')
** Countplot using seaborn showing the counts of loans by purpose, with the color hue defined by not.fully.paid. **
<matplotlib.axes._subplots.AxesSubplot at 0x7fc35f307c50>
** Trend between FICO score and interest rate using jointplot.**
<seaborn.axisgrid.JointGrid at 0x7fc35e4a5d50>
** Lmplots to see if the trend differed between not.fully.paid and credit.policy. **
<seaborn.axisgrid.FacetGrid at 0x7fc35e1033d0>
<Figure size 864x576 with 0 Axes>
Categorical Features
The purpose column as categorical. We need to transform them using dummy variables so sklearn will be able to understand them.
** Final Data head **
credit.policy | int.rate | installment | log.annual.inc | dti | fico | days.with.cr.line | revol.bal | revol.util | inq.last.6mths | delinq.2yrs | pub.rec | not.fully.paid | purpose_credit_card | purpose_debt_consolidation | purpose_educational | purpose_home_improvement | purpose_major_purchase | purpose_small_business | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0.1189 | 829.10 | 11.350407 | 19.48 | 737 | 5639.958333 | 28854 | 52.1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
1 | 1 | 0.1071 | 228.22 | 11.082143 | 14.29 | 707 | 2760.000000 | 33623 | 76.7 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 0.1357 | 366.86 | 10.373491 | 11.63 | 682 | 4710.000000 | 3511 | 25.6 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
3 | 1 | 0.1008 | 162.34 | 11.350407 | 8.10 | 712 | 2699.958333 | 33667 | 73.2 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
4 | 1 | 0.1426 | 102.92 | 11.299732 | 14.97 | 667 | 4066.000000 | 4740 | 39.5 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Model Evaluation of Decision Tree
Classification report and a confusion matrix.
precision recall f1-score support
0 0.85 0.82 0.84 2431
1 0.19 0.23 0.21 443
avg / total 0.75 0.73 0.74 2874
[[1990 441]
[ 341 102]]
Model Evaluation of Random Forest
**Classification report, confusion matrix from predictions:
precision recall f1-score support
0 0.85 1.00 0.92 2431
1 0.57 0.03 0.05 443
avg / total 0.81 0.85 0.78 2874
[[2422 9]
[ 431 12]]