I am in a situation where I have a query space Q and a key space K, both filled with N d-dimensional real vectors (N ≈ 10^6, d ≈ 50). For each query q, I want to find the k≈10 keys k_i that have the highest inner products with q. One may assume that both keys and queries are normally distributed. Ideally, I would have a solution that solves this problem for N=10^6 in +- 1 second on a powerful GPU. An approximate solution is acceptable.
Any reference to an efficient algorithm or implementation would be appreciated.
Especially if it is O(N log N) algorithm (total for all queries). It seems to me that this should be possible by creating some kind of binary search tree on the keys, but I have found no algorithm or implementation yet with that complexity.
I know of the following implementations:
-
FAISS
-> solves it in around 20s on an RTX 2060
-> seems subquadratic, but not O(N log N), see diagram -
Annoy
-> didn’t try but it seems from the README that every query needs to be processed sequentially … -
TorchPQ
-> I tried it and it seems incredibly slow … But that is probably because you need to add the keys one by one, which is very inefficient. -
The algorithm proposed in the paper: Maximum Inner-Product Search using Tree Data-structures
-> uses a branch-and-bound algorithm on ball trees
-> I created a Pytorch C++ extension that does this, but it is orders of magnitudes slower because it runs on a CPU. -
Asymmetric Locality Sensitive Hashing (ALSH), like in this paper. The only implementation that I found is this Github repo. However, it does not have GPU support, so I doubt that it will be faster than FAISS.