SVM SVC: Metric for parameter optimization on imbalanced data

https://datascience.stackexchange.com/questions/68530

09-12-2020
|

Question

I trained a multiclass SVC with RBF kernel on a down-sampled (and therefore balanced) dataset. Now I want to perform grid search to find best cost and gamma.

What performance metric should I optimize for?

I have a highly imbalanced test set. There might be a factor of over 100 between the number of instances of different classes. I am classifying 3D points (car, facade, human) - so I think one could assign equal weight to all classes.

Solution

Using resampling methods to fix the imbalanced databases a good method. This can be completed by oversampling, downsampling or creating synthetic instances.

In this kind of problem, precision is not necessarily a good evaluation metric. Instead evaluating True Positives, True Negatives, False Positives and False Negatives is better way to evaluate this.

For this evaluation following techniques can be utilized from Sklearn library.

Recall score
Accuracy score
F1 score
AUC Score

In Sklearn the method caled "classification_report" gives a summary of those evaluations (except AUC Score from the list). That can be a good evaluation metric to check the performance of the model.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange