I design a new attention mechanism. Let’s say, in a sentence “play _ing a basketball is ultra cool!” I have a tokens pair play-basketball. These two tokens have some embedding vectors, for example, 1000-numbers long. And these two vectors are in some constant relationship, defined by a 1000-number-long binary vector, which I already know from a previous training session.
My question is, how I can quickly access this binary vector, whenever I spot play-basketball pair in the sentence. Obviously, I need to access other binary vectors, as well, like play-ultra, play-cool, ultra-cool, etc. I need to access 6^2 binary vectors for this sentence.
I do not know any fast look-up mechanisms used in Machine Learning, although I considered using sparse network connections for this.