At 10:33 in the video, it is explained that Qn Kn represents the nth column of the matrix.
At 10:43, it is introduced that the calculation result of Ki and Qj dot product should be the ith row and jth column of the Attention matrix. However, the formula for Attention is QK’, and QK’ cannot obtain such a calculation result! On the contrary, it should be K’Q that can obtain the Attention matrix shown in the video. So, is the formula written wrong in the video? Or is the Attention matrix wrong?
The following is MATLAB code to illustrate this problem,Please execute this code in the real-time script of MATLAB to achieve the best display effect.
clear
syms Q K q k;
Q=sym('q',[3 3]);
K=sym('k',[3 3]);
Q
K
Q*K'
K'*Q
I’m sorry, I tried to use LaTeX, but it couldn’t display properly. $K_{n}$