Dot-product attention layer, a.k.a. Luong-style attention.
layer_attention(
inputs,
use_scale = FALSE,
causal = FALSE,
batch_size = NULL,
dtype = NULL,
name = NULL,
trainable = NULL,
weights = NULL
)
a list of inputs first should be the query tensor, the second the value tensor
If True, will create a scalar variable to scale the attention scores.
Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past.
Fixed batch size for layer
The data type expected by the input, as a string (float32
,
float64
, int32
...)
An optional name string for the layer. Should be unique in a model (do not reuse the same name twice). It will be autogenerated if it isn't provided.
Whether the layer weights will be updated during training.
Initial weights for layer.
Other core layers:
layer_activation()
,
layer_activity_regularization()
,
layer_dense_features()
,
layer_dense()
,
layer_dropout()
,
layer_flatten()
,
layer_input()
,
layer_lambda()
,
layer_masking()
,
layer_permute()
,
layer_repeat_vector()
,
layer_reshape()