I’ve been trying to look at the attention scores of a pretrained transformer when I pass specific data in. It’s specifically a Pytorch Transformer. I’ve tried using forward hooks, but I’m only able to get the final output of attention modules when what I want is NxN matrices of attention scores (softmax(QxK). I also would really prefer to do this via pytorch code and not use outside tools such as BertViz.
Does anyone know if there’s a way to do this?
Thomas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.