Output shape:

Attention outputs of shape [batch_size, Tq, dim].

The meaning of query, value and key depend on the application. In the
case of text similarity, for example, query is the sequence embeddings of
the first piece of text and value is the sequence embeddings of the second
piece of text. key is usually the same tensor as value.

Here is a code example for using Attention in a CNN+Attention network: