How can I extract sentence embeddings similar to the [CLS]
token in BERT from a transformer decoder model? I want to use these embeddings as input for another network. Specifically, I’m interested in using a decoder-transformer because there are models that support longer context windows compared to existing encoder models. Is there a way to achieve this? Or is this not possible by design of the decoder-architecture?