Why does Transfer Learning works better on smaller datasets than on larger ones?

https://datascience.stackexchange.com/questions/80806

13-12-2020
|

Question

This question is not about the utility of Tranfer Learning compared with regular supervised learning.

1. Context

I'm studying Health-Monitoring techniques, and I practice on the C-MAPSS dataset. The goal is to predict the Remaining Useful Life (RUL) of an engine given sensor measurements series. In health-monitoring, a major issue is the low amount of failure examples (one can't afford to perform thousands of run-to-failure tests on aircraft engines). This is why Transfer Learning has been studied to solve this, in Transfer Learning with Deep Recurrent Neural Networks for Remaining Useful Life Estimation, Zhang et al, 2018. My question is about the results presented in this article.

2. Question

C-MAPSS dataset is composed of 4 subdatasets, each of which has different operational modes and failure modes. The article cited above performs transfer learning between these subdatasets. Particularly, when training a model on a target subdataset B using the weights of a trained model on a source dataset A, they don't train on all the B dataset. They conduct an experiment in which they test various sizes for the target dataset B : they try on 5%, 10%, ..., 50% of the total dataset B.

The results are presented in page 11. A few cases excepted, the have better results on smaller target datasets. This seems counter intuitive to me : how could the model learn better on less examples ?

Why does Transfer Learning works better on smaller datasets than on larger ones ?

Solution

In the article you provide, from page 11 results, I think one cannot conclude that transfer learning works better on smaller datasets than on larger ones.

If you look at the results of transfer learning score values (or RMSE) vs size of learning, it is also getting better while dataset size is increasing (for instance E2 or E5 or E8). So transfer learning does not work better on small datasets.

However, you might be looking at the IMP index which is based on the mean score (or RMSE) of learning with and without transfer learning.

IMP= (1−(WithTransfer)/(NoTransfer))×100

The index is based on two curves.

WithTransfer which will have good performances even at the beginning because when using relevant transfer learning, the model could already extract pertinent information from a very small testing dataset.
NoTransfer which will start with poor performances (difficulty to generalize) and then increasing with the size of testing data.

The IMP index has then the expected curve you're pointed out, for example with E2 and E5.

OTHER TIPS

Transfer learning is in principle designed to utilise knowledge acquired from a training on a larger generic dataset (ie. animal pictures classification) to train a model that focuses on a more specific task using a smaller dataset (ie. cat breed pictures classification).

Transfer learning is otherwise called domain adaptation and in essence refers to the improvement of generalisation in one setting by exploiting what has already been learned in another setting. It boils down to using patterns learned from solving a more generic task to train and solve a more specific task because of lack of data.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange