The unreduced (i.e. with reduction
set to 'none'
) loss can be described as:
$$
\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
l_n = - w_n \left[ y_n \cdot \log \sigma(x_n)
+ (1 - y_n) \cdot \log (1 - \sigma(x_n)) \right],
$$
where \(N\) is the batch size. If reduction
is not 'none'
(default 'mean'
), then
$$
\ell(x, y) = \begin{array}{ll}
\mbox{mean}(L), & \mbox{if reduction} = \mbox{'mean';}\\
\mbox{sum}(L), & \mbox{if reduction} = \mbox{'sum'.}
\end{array}
$$
This is used for measuring the error of a reconstruction in for example
an auto-encoder. Note that the targets t[i]
should be numbers
between 0 and 1.
It's possible to trade off recall and precision by adding weights to positive examples.
In the case of multi-label classification the loss can be described as:
$$
\ell_c(x, y) = L_c = \{l_{1,c},\dots,l_{N,c}\}^\top, \quad
l_{n,c} = - w_{n,c} \left[ p_c y_{n,c} \cdot \log \sigma(x_{n,c})
+ (1 - y_{n,c}) \cdot \log (1 - \sigma(x_{n,c})) \right],
$$
where \(c\) is the class number (\(c > 1\) for multi-label binary
classification,
\(c = 1\) for single-label binary classification),
\(n\) is the number of the sample in the batch and
\(p_c\) is the weight of the positive answer for the class \(c\).
\(p_c > 1\) increases the recall, \(p_c < 1\) increases the precision.
For example, if a dataset contains 100 positive and 300 negative examples of a single class,
then pos_weight
for the class should be equal to \(\frac{300}{100}=3\).
The loss would act as if the dataset contains \(3\times 100=300\) positive examples.