What supervised machine learning model can be used to generate a scorecard-like result?
-
11-12-2020 - |
Pregunta
A scorecard is typically used in Credit Application. One very common model for developing a credit scorecard is logistic regression since it has well-defined probabilities.
Apart from logistic regression, is there any model that can be used in the scorecard?
For example, I don't know whether Support Vector Machine can be used since it only outputs a decision boundary.
More on the scorecard:
- Features are assigned with weightings
- All features are categorical
- The sum of weightings of all features with value True is the total score (like a checklist)
- There will be a cutoff point to classify good/bad (label, +1,-1)
- How far from the cutoff point represents probabilities.
Solución
It depends what you mean by "can be used": any regression algorithm can be used, the question is how reliably it would perform. You can compare different algorithms experimentally (if you have a dataset).
[Updated after question edited]
In general the way to use ML with this kind of setting is to train a classification model based only on the categorical features. Depending on the type of algorithm, the combination of features might not always be a weighted sum, and the result label may or may not be based on a cutoff point. In order to have a cutoff point (thus a numerical prediction), the method must be a soft classification method. Alternatively a regression model could be trained for predicting the numerical value.
So that leaves you with many options:
- soft classification: linear/logistic regression, Naive Bayes, ...
- regression: linear/logistic regression, SVM, decision trees, ...
Note: technically the probability doesn't represent "how far from the cutoff point", it represents the probability of the instance being positive (p=1).