Bagging vs Dropout in Deep Neural Networks

https://datascience.stackexchange.com/questions/8860

16-10-2019
|

Question

Bagging is the generation of multiple predictors that works as ensamble as a single predictor. Dropout is a technique that teach to a neural networks to average all possible subnetworks. Looking at the most important Kaggle's competitions seem that this two techniques are used together very often. I can't see any theoretical difference besides the actual implementation. Who can explain me why we should use both of them in any real application? and why performance improve when we use both of them?

Solution

Bagging and dropout do not achieve quite the same thing, though both are types of model averaging.

Bagging is an operation across your entire dataset which trains models on a subset of the training data. Thus some training examples are not shown to a given model.

Dropout, by contrast, is applied to features within each training example. It is true that the result is functionally equivalent to training exponentially many networks (with shared weights!) and then equally weighting their outputs. But dropout works on the feature space, causing certain features to be unavailable to the network, not full examples. Because each neuron cannot completely rely on one input, representations in these networks tend to be more distributed and the network is less likely to overfit.

OTHER TIPS

I found a comparison of the two kind of nets in Max Out Networks which says:

Dropout training is similar to bagging (Breiman, 1994), where many different models are trained on different subsets of the data. Dropout training differs from bagging in that each model is trained for only one step and all of the models share parameters. For this training procedure (dropout) to behave as if it is training an ensemble rather than a single model, each update must have a large effect, so that it makes the sub-model induced by that µ fit the current input v well.

Hope it will be useful.

Dropout is a regularization technique used to avoid overfitting in large neural networks specifically by leaving out some of the neurons in hidden layers(hence the name dropout for the left out neurons) after training. Basically if the network really learnt anything during training then dropping out some of the neurons shouldn't affect the precision of the predictions negatively.

Bagging is an effective regularization technique also, used to reduce variance from the training data and improving the accuracy of your model by using multiple copies of it trained on different subsets of data from the initial/larger training dataset.

see this question

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange