Dot-product attention layer, a.k.a. Luong-style attention
layer_attention(
inputs,
use_scale = FALSE,
score_mode = "dot",
...,
dropout = NULL
)
List of the following tensors:
query: Query Tensor of shape [batch_size, Tq, dim]
.
value: Value Tensor of shape [batch_size, Tv, dim]
.
key: Optional key Tensor of shape [batch_size, Tv, dim]
. If not
given, will use value for both key and value, which is the most common
case.
If TRUE
, will create a scalar variable to scale the attention
scores.
Function to use to compute attention scores, one of
{"dot", "concat"}
. "dot"
refers to the dot product between the query
and key vectors. "concat"
refers to the hyperbolic tangent of the
concatenation of the query and key vectors.
standard layer arguments (e.g., batch_size, dtype, name, trainable, weights)
Float between 0 and 1. Fraction of the units to drop for the attention scores. Defaults to 0.0.
inputs are query
tensor of shape [batch_size, Tq, dim]
, value
tensor
of shape [batch_size, Tv, dim]
and key
tensor of shape
[batch_size, Tv, dim]
. The calculation follows the steps:
Calculate scores with shape [batch_size, Tq, Tv]
as a query
-key
dot
product: scores = tf$matmul(query, key, transpose_b=TRUE)
.
Use scores to calculate a distribution with shape
[batch_size, Tq, Tv]
: distribution = tf$nn$softmax(scores)
.
Use distribution
to create a linear combination of value
with
shape [batch_size, Tq, dim]
:
return tf$matmul(distribution, value)
.
Other core layers:
layer_activation()
,
layer_activity_regularization()
,
layer_dense_features()
,
layer_dense()
,
layer_dropout()
,
layer_flatten()
,
layer_input()
,
layer_lambda()
,
layer_masking()
,
layer_permute()
,
layer_repeat_vector()
,
layer_reshape()