I’m doing some research and implementations about seq2seq architecture for my problem
I think of adding a self attention in decoder block might help reduce the competition. But when I look up on google. I couldn’t find any article about using self attention on decoder of seq2seq.
But I found a post: How to add self-attention to a seq2seq model in keras. Then it might be implemented in Pytorch.
Does any one know how or have a references?
ethan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.