Question

I am designing a convolutional neural network that I believe requires transfer learning to function in practice. The network will be a character level CNN for text classification, more specifically, authorship identification of an author given unknown texts. The initial model will be trained on millions of texts from thousands of authors. In practice, if I want to be able to determine the authorship of a new given author/class not trained upon originally, I need to use transfer learning.

The structure of the network involves 6 convolutional layers and 3 fully connected layers. Given that the amount of data of the new author/class will be minimal in most cases, which layers should I replace and retrain for the new class for it to be the most effective? Or are there other methods I could consider to solve this problem?

Was it helpful?

Solution

To build on the previous answer:

In transfer learning, the goal is to use a pre-trained model and tweak the model to then specialise it to suit a certain task. So, what we do is, as SrJ has eluded to, keep the main model's architecture in tact. So this would be the 6 CNN layers (and possibly the three linear layers, if they were also involved in pre-training).

After a model has been pre-trained, what we do is add additional layers to the model so that it is suited for our task. So in this case, the least you would do is have a final output softmax layer, which produces a probability distribution over the authors.
In between the final output layer and the original model's architecture, you can add more layers if it is appropriate.

When training this model with your task-specific data (this stage is called fine-tuning). We freeze our original model's architecture. This essentially means that the parameters within the models original layers will not change to prevent possible loss in generalisation performance. We only allow the additional layer's parameters to change during fine-tuning.

Overall message is to not replace layers, always add onto the existing model to tailor the model more to your classification task.

OTHER TIPS

You should retrain the last linear layers and keep the CNN layers unchanged.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top