The dataset contains data of past credit applicants. The applicants are rated as good or bad. Models of this data can be used to determine if new applicants present a good or bad credit risk.
data("GermanCredit")
A data frame containing 1,000 observations on 21 variables.
factor variable indicating the status of the existing checking account, with levels ... < 0 DM
, 0 <= ... < 200 DM
, ... >= 200 DM/salary for at least 1 year
and no checking account
.
duration in months.
factor variable indicating credit history, with levels no credits taken/all credits paid back duly
, all credits at this bank paid back duly
, existing credits paid back duly till now
, delay in paying off in the past
and critical account/other credits existing
.
factor variable indicating the credit's purpose, with levels car (new)
, car (used)
, furniture/equipment
, radio/television
, domestic appliances
, repairs
, education
, retraining
, business
and others
.
credit amount.
factor. savings account/bonds, with levels ... < 100 DM
, 100 <= ... < 500 DM
, 500 <= ... < 1000 DM
, ... >= 1000 DM
and unknown/no savings account
.
ordered factor indicating the duration of the current employment, with levels unemployed
, ... < 1 year
, 1 <= ... < 4 years
, 4 <= ... < 7 years
and ... >= 7 years
.
installment rate in percentage of disposable income.
factor variable indicating personal status and sex, with levels
male:divorced/separated
, female:divorced/separated/married
,
male:single
, male:married/widowed
and female:single
.
factor. Other debtors, with levels none
, co-applicant
and guarantor
.
present residence since?
factor variable indicating the client's highest valued property, with levels real estate
, building society savings agreement/life insurance
, car or other
and unknown/no property
.
client's age.
factor variable indicating other installment plans, with levels bank
, stores
and none
.
factor variable indicating housing, with levels rent
, own
and for free
.
number of existing credits at this bank.
factor indicating employment status, with levels unemployed/unskilled - non-resident
, unskilled - resident
, skilled employee/official
and management/self-employed/highly qualified employee/officer
.
Number of people being liable to provide maintenance.
binary variable indicating if the customer has a registered telephone number.
binary variable indicating if the customer is a foreign worker.
binary variable indicating credit risk, with levels good
and bad
.
The use of a cost matrix is suggested for this dataset. It is worse to class a customer as good when they are bad (cost = 5), than it is to class a customer as bad when they are good (cost = 1).
# NOT RUN {
data("GermanCredit")
summary(GermanCredit)
# }
# NOT RUN {
gcw <- array(1, nrow(GermanCredit))
gcw[GermanCredit$credit_risk == "bad"] <- 5
suppressWarnings(RNGversion("3.5.0"))
set.seed(1090)
gct <- evtree(credit_risk ~ . , data = GermanCredit, weights = gcw)
gct
table(predict(gct), GermanCredit$credit_risk)
plot(gct)
# }
Run the code above in your browser using DataLab