$$
y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
$$
The mean and standard-deviation are calculated per-dimension over the
mini-batches and \(\gamma\) and \(\beta\) are learnable parameter
vectors of size C
(where C
is the input size). By default, the elements
of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to
0. The standard-deviation is calculated via the biased estimator,
equivalent to torch_var(input, unbiased = FALSE)
.
Also by default, during training this layer keeps running estimates of its
computed mean and variance, which are then used for normalization during
evaluation. The running estimates are kept with a default momentum
of 0.1.
If track_running_stats
is set to FALSE
, this layer then does not
keep running estimates, and batch statistics are instead used during
evaluation time as well.