Fast look-up from a precomputed matrix
I design a new attention mechanism. Let’s say, in a sentence “play _ing a basketball is ultra cool!” I have a tokens pair play-basketball. These two tokens have some embedding vectors, for example, 1000-numbers long. And these two vectors are in some constant relationship, defined by a 1000-number-long binary vector, which I already know from a previous training session.