In the huggingface-whisper
implementation, in super().generate() function, the initial decoder token ids are passed to predict next token. In shortform, I am unclear how the generate()
is happening all at once (shouldn’t be sequential?)
Say initial_decoder_tokens = [50258, 50259, 50359, 50363]
, when passed to generate(), it generates the full token sequence [50258, 50259, 50359, 50363, 1012, 309, 307, 281, 312, 2279, 257, 1355, 295, 1319,13, 50257]
at once. I want to understand how this is happening.
Secondly, I want to pass the initial decoder hidden states (instead of tokens) and output the next token hidden state (and not the actual integer token) i.e the model will be generate any token and only pass the hidden states for each token for prediction. is there a way to do that in this implementation?