extracting word embeddings from log probabilities
I am solving an image captioning related issue and eventually I have extract the embeddings of the tokens. One possible way is to extract the embeddings using the tokens. But I cannot do do that because in my case the whole process needs to be differentiable and the tokens are not differentiable.