I am currently looking for a datastructure to store datapoints that allows for range/KNN queries based on a similarity measure definte by a kernal function.
Essentially I want to solve the following problem:
Given point x* return all datapoints for that some kernel function k(x,x*) is above a certain threshold.
Ideally, the datstructures should also allow for updates, adding and removing points, as I need to add and remove a couple of points every once in a while.
Naively I could compute the kernel values for all datapoints but that seems very inefficient. I thought about using ball-trees (f.ex implemented by scipy) using a cooresponding metric. This would for example be straight forward for the RBF-Kernel as I could then use the euclidean norm as a metric in the ball-tree. is there something that works for more general kernel functions (and maybe even for “non-metric” ones like periodic kernels).
Also, I would creatly appreciate any performant python library that could be used for implementation.
Many thanks in advance for any help.
Lukas Schroth is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.