Understanding MSE,R2 scores wrt different scaling methods and non intutive results

https://datascience.stackexchange.com/questions/69770

09-12-2020
|

Question

EDIT: Added Code and updated the metric values as my code changed

If I have the Income Statements of all the companies currently trading in the US, I would like to predict the gross profit. I was playing with the different scalers in scikit-learn and comparing mean squared error and r2 metrics. I also have a separate dataframe of two companies e.g. Google, IBM, to sanity check my predictions. I use the predictions of the model, do inverse transform to get back real values and then try to sanity check if my predictions are close to the actual values.

dataset = .. mostly non gaussian, unscaled raw, non zero, non null dataframe
test_dataset = ..unscaled raw, non zero, non null dataframe containing Google and IBM income sheet data. This is for my sanity test. 
# Label Column
label_column = 'grossProfit'

# Scaler Functions
def scaler_transform(scaler, x):
  df = x.copy()  
  df[df.columns] = scaler.transform(df[df.columns])
  return df

def scaler_fit(scaler, x):
  df = x.copy()  
  scaler.fit(df[df.columns])
  return scaler

def scaler_transform_inverse(scaler, x):
  df = x.copy()  
  df[df.columns] = scaler.inverse_transform(df[df.columns])
  return df


# Actual Scaling
scaler = scaler_fit(RobustScaler(), dataset)
normed_training_dataframe = scaler_transform(scaler, dataset)
normed_test_dataframe = scaler_transform(scaler, test_dataset)
training_dataframe = normed_training_dataframe
test_dataframe = normed_test_dataframe


# Ludwig
from ludwig.api import LudwigModel
final_training_dataframe = training_dataframe

# Neural Net: 4 hidden layers
model_definition = {'input_features': [{'name': 'incomeBeforeTax', 'type': 'numerical'},
  {'name': 'netIncome', 'type': 'numerical'},
  {'name': 'sellingGeneralAdministrative', 'type': 'numerical'},
  ..
  {'name': 'netIncomeApplicableToCommonShares', 'type': 'numerical'},
  {'name': 'totalRevenue', 'type': 'numerical'}],
 'output_features': [{'fc_size': 168,
   'name': 'grossProfit',
   'num_fc_layers': 4,
   'optimizer': {'type': 'mean_squared_error'},
   'type': 'numerical'}],
 'training': {'batch_size': 32,
  'epochs': 10000,
  'learning_rate': 0.004516233033842227,
  'optimizer': {'type': 'rmsprop'}}}

model = LudwigModel(model_definition)
final_train_stats = model.train(final_training_dataframe)

# Metric Calculation
mse = final_train_stats['validation'][label_column]['mean_squared_error'][-1]
print('MSE:' + str(mse))
r2 = final_train_stats['validation'][label_column]['r2'][-1]
print('R2:' + str(r2))


# Sanity Check
predict_df = model.predict(data_df=test_dataframe)

prediction_full_standardized_df = test_dataframe.copy().drop(label_column, 1)
predict_df_copy = predict_df.copy()
predict_df_copy.columns = [label_column]

prediction_full_standardized_df = prediction_full_standardized_df.reset_index(drop='True')
prediction_full_standardized_df = pd.merge(prediction_full_standardized_df, predict_df_copy, left_index=True, right_index=True)
prediction_full_standardized_df

actual_full_df = test_dataset.copy()
actual_full_df = actual_full_df.reset_index(drop='True')
actual_full_df

# Inverse Transform
prediction_full_df = scaler_transform_inverse(scaler, prediction_full_standardized_df)

diff = (prediction_full_df[label_column] / actual_full_df[label_column] - 1) * 100
diff.abs().mean(axis=0)

My concern is that I was expecting that a lower MSE score would give me the closest predictions but that seems not to be the case when I do a sanity check. Note, though I provide R2 values, the training is optimized for MSE as you can see in the model_definition

---
MinMaxScaler:
MSE:0.0003734849866593949
R2:-38848.26144771535
diff.abs().mean(axis=0):10952.754278227647

RobustScaler:
MSE:17474834283304.852
R2:-7199011407.460549
diff.abs().mean(axis=0): 91.40927816440639


StandardScaler:
MSE:0.021922195014768487
R2:0.0030305917889202547
diff.abs().mean(axis=0): 74.77342625156986

Solution

You don't show how you compute R2, but, something is very wrong here. It can only be negative if your model is very poor, and here it's very negative.

Likewise MSE is all over the place; it seems like it is not computed on unscaled values.

The diff is at least missing an absolute value.

I think you have further errors in the code you used to compute these. Use standard scikit classes. Don't invert the transforms yourself. Just build a pipeline of the transformations and model, and use it to fit and transform.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange