Question

The Transformer Decoder takes in two inputs, the encoder's output, and the target sequence. How the target is fed into the decoder has been provided in this answer

I am having confusion about what the target sequence will be when the trained model is evaluated?.

Is it that we start with a <SOS> tag for the first timestep and loop through the transformer decoder for each timestep like in RNN's?

It would be helpful if someone can clarify this for me.

Was it helpful?

Solution

At training time, the input to the decoder is the target sentence tokens, which are indeed unknown at the test time. What you call the second input are the desired outputs, which are not usually referred to as an input to the decoder, 1. for clarity, 2. they are technically input to the loss function.

At test time, we do not need the loss function, but we still need to pass some input to the decoder. The decoding proceeds autoregressively, i.e., at each decoding step, we execute the decoder layers and get a probability distribution over the target tokens. We select one token (typically the best-scoring one, but it gets trickier with beam search) and append it to the input of the decoder. It means that the input to the decoder is generated one token at a time, gradually as the sentence is decoded.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top