How to add self-attention to decoder in a rnn-based seq2seq [Pytorch] I’m doing some research and implementations about seq2seq architecture for my problem