$$
y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
$$
The mean and standard-deviation are calculated per-dimension over
the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors
of size C
(where C
is the input size). By default, the elements of \(\gamma\) are set
to 1 and the elements of \(\beta\) are set to 0. The standard-deviation is calculated
via the biased estimator, equivalent to torch_var(input, unbiased=FALSE)
.
Also by default, during training this layer keeps running estimates of its
computed mean and variance, which are then used for normalization during
evaluation. The running estimates are kept with a default momentum
of 0.1.
If track_running_stats
is set to FALSE
, this layer then does not
keep running estimates, and batch statistics are instead used during
evaluation time as well.