$$
y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
$$
The mean and standard-deviation are calculated per-dimension over
the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors
of size C
(where C
is the input size). By default, the elements of \(\gamma\)
are set to 1 and the elements of \(\beta\) are set to 0.
Also by default, during training this layer keeps running estimates of its
computed mean and variance, which are then used for normalization during
evaluation. The running estimates are kept with a default :attr:momentum
of 0.1.
If track_running_stats
is set to FALSE
, this layer then does not
keep running estimates, and batch statistics are instead used during
evaluation time as well.