Question

I am learning how to implement CNN, and searching on the internet I have found that a trick to design a good network is to first build it in such a way that it overfits, and then use regularization to elimnate overfitting and have a good performing network.

But how do I do this? I don't understand how do I build a network that overfits on purpose? And also in which way do I use regularization after?

Can someone helo me?

Thanks in advance.

Was it helpful?

Solution

The reason of overfitting is generally because of the high(unnecessary) complexity model. So if you train a too complex(large) model, you will get a high training accuracy and low test set accuracy. Then you can start to fight with overfitting with getting more data, dropout, regularization, early stopping, global average pooling, feature scale clipping or dropping some of layers from your network.

OTHER TIPS

It sounds like you are talking about orthogonalization, in this approach you break the focus of model fitting into 4 stages:

You build a model that:

Step 1: Fits the training data well, this is the primary focus at this point, in this step overfitting is likely to occur.

Step 2: Fits dev data well, in this step you likely to add regularization to address overfitting

Step 3: Fits test data well, this step is needed because dev data has been heavily experimented against and the model could overfit to the dev data.

Step 4: Fits real world data well

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top