layer_attention

a list of inputs first should be the query tensor, the second the value tensor

inputs

If True, will create a scalar variable to scale the attention scores.

use_scale

Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j &gt; i.
This prevents the flow of information from the future towards the past.

causal

batch_size

The data type expected by the input, as a string (<code>float32</code>,
<code>float64</code>, <code>int32</code>...)

dtype

An optional name string for the layer. Should be unique in a
model (do not reuse the same name twice). It will be autogenerated if it
isn't provided.

name

Whether the layer weights will be updated during training.

trainable

weights

Dot-product attention layer, a.k.a. Luong-style attention.

Interface to 'Keras' <https://keras.io>, a high-level neural
networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation,
supports both convolution based networks and recurrent networks (as well as
combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

layer_attention: Creates attention layer

Description

Usage

Arguments

See Also