Question

The feature set for my multi-class multi-label classification task, using the MLPClassifier from scikit learn, contains mostly features where the values are in the same range of [0,1], but there are 3 out of 45 features where this isn't the case and feature scaling is required. So far I've tried out min-max normalization, mean normalization and z-score normalization on these features. However all scaling methods result in slightly different train and test performance and z-score standardization results in the fastest convergence but worst scores overall. To measure performance, Precision, Recall, F1 and MCC were used.

What is a decent strategy when choosing a type of feature scaling?

Was it helpful?

Solution

You've listed a few different error metrics, I would pick one metric that is best suited to your problem. Trying to maximize several metrics at once makes it difficult to tell if your model is getting better. In any case, if normalization leads to the best score - then that is your answer. Since all other variables were already in range [0,1], then that's probably what I would have started with just to keep it consistent.

I am curious how much worse the z-score standardization is performing compared to normalization. From what you've written about the data - I'd be surprised if standardization vs. normalization produced significantly different results when only 3 out of 45 features are impacted.

This blog explains the topic better than I can About Feature Scaling. This article has several examples of training a MLP with scaled and unscaled variables How to use Data Scaling Improve Deep Learning Model Stability and Performance.

I'd also consider some simpler algorithms. It might be good to also train a logistic regression or similar as a comparison, fewer parameters will be easier to configure and less prone to error. A neural net will give you a boost if you have a lot of training data or a very non-linear problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top