Can’t reproduce same output with einsum
I was conducting a few experiments to see if changing the order of matrix multiplications in linear attention would yield the same results.
I was conducting a few experiments to see if changing the order of matrix multiplications in linear attention would yield the same results.