In the simplest case, the output value of the layer with input size \((N, C, H, W)\),
output \((N, C, H_{out}, W_{out})\) and kernel_size
\((kH, kW)\)
can be precisely described as:
$$
\begin{array}{ll}
out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\
& \mbox{input}(N_i, C_j, \mbox{stride[0]} \times h + m,
\mbox{stride[1]} \times w + n)
\end{array}
$$
If padding
is non-zero, then the input is implicitly zero-padded on both sides
for padding
number of points. dilation
controls the spacing between the kernel points.
It is harder to describe, but this link
has a nice visualization of what dilation
does.
The parameters kernel_size
, stride
, padding
, dilation
can either be:
a single int
-- in which case the same value is used for the height and width dimension
a tuple
of two ints -- in which case, the first int
is used for the height dimension,
and the second int
for the width dimension