Each channel will be zeroed out independently on every forward call with
probability p
using samples from a Bernoulli distribution.
Usually the input comes from nn_conv2d modules.
As described in the paper
Efficient Object Localization Using Convolutional Networks ,
if adjacent pixels within feature maps are strongly correlated
(as is normally the case in early convolution layers) then i.i.d. dropout
will not regularize the activations and will otherwise just result
in an effective learning rate decrease.
In this case, nn_dropout3d will help promote independence between
feature maps and should be used instead.