Should scaling be done for mixed data (categorical and numerical)?

https://datascience.stackexchange.com/questions/60287

02-11-2019
|

Question

My dataset contains 13 attributes consisting of 10 Numerical and 3 Categorical attributes and Target. It has 180 observations

NumFeature1,NumFeature2....NumFeature10,CatFeature1,CatFeature2,CatFeature3, Target

All Categorical are non-ordinal and each have the following categories:

CatFeature1: 0/1

CatFeature2: 0/1/2

CatFeature3: 0/1/2/3

It is a binary classification problem where we have to predict the probability for each class of the target class.

I have 3 Questions for above dataset:

Q1- For the categorical Feature, Should I use LabelEncoder() or OneHotEncoder() or df.get_dummies() or should I just combine custom label encoder with one-hot encoder?

Q2- Should scaling be done for Numerical features only or it should be done for all the features including categorical after encoding

Q3- What should be the best model to get the probability of the binary classification. So far, I have tried kNN, LogisticRegression, and RandomForestClassifier with predict_proba, but log_loss score was 0.301 the best.

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange