I have a list in Python 2.7 of about 10000 point co-ordinates, like
[(168, 245), (59, 52), (61, 250), ... (205, 69), (185, 75)]
Is there a faster way to search for all the points in a bounding box, instead of just iterating over the entire list each time and checking if the co-ord is inside the 4 sides??
I’m using this “algoritm” to see if any point, which is randomly moving, has come inside the stationary bounding box..(if it helps)
10
If the points are constantly moving, then it’s impossible to do better than O(n)
, because you have to check each point every time it moves. You have some code somewhere that moves the points, just add the bounds check into that code. Keeping a list sorted, or updating a quadtree or something would only be more efficient if the point positions were not changing frequently and instead you were checking several different bounding boxes, like searching for points on a map, for example.
If the points are moving according to a preset pattern, like in a straight line at a constant velocity, then you can change your algorithm to only do the bounds check when the direction or speed is changed. For example, if you are moving one pixel per second, and the bounding box is 10 pixels away, you know you don’t have to check that point for another 10 seconds. 100 points is a pretty small number though, and at that small of a scale, the overhead from the extra complexity may not outweigh the extra iterations of the simpler algorithm.
Also, that small of a scale makes me wonder if you’re sure the bounds checking is your bottleneck. A modern computer should be easily able to bounds check 100 points in under 10 microseconds. Have you actually measured it?
2
If possible I’d use a different data structure, one that is designed for spatial indexing. e.g. a quadtree (there are many others though)
This should allow (assuming non-degenerate data) to find a closest point in typicaly O(log n) comparisons. You could also use a spatial DB to do this for you .
edit: quadtrees really help when your points are (relatively) static to your search box, your edit suggests you have the opposite problem in which case you may not be able to do better than O(n) i.e. you probably dont want to construct a quadtree every time the points move unless you are going to be checking many bounding boxes for each move
5
As stated by you, there is no way of being faster than O(n). Every point in the list might be a hit, so you have to scan the entire list.
It would be possible to arrange the list such that you can terminate the search early if you can prove that none of the following points will be in the desired bounding box. For instance, by sorting the point by their X coordinates, you can stop searching when the value becomes higher than your upper boundary. Depending on what kind of point lists you get, other schemes may be more efficient.
3
What does “constantly moving” mean?
For example, if these points represent particles moving with a constant velocity and only affected by hard-sphere collision with other particles and with the wall, then it’s possible to compute the time to intersection of each particle with the bounding box, and the compute the time for next collision. The time for next intersection can be kept in a priority queue, and when an intersection or collision occurs, you need only compute, in O(n) time, the next intersection of that particle/those particles with every other particle.
The mechanics of this are outlined in http://introcs.cs.princeton.edu/java/assignments/collisions.html , and summarized by Karl Bielefeldt earlier.
If the movement is random, but the changes are small and bounded then there’s an upper limit on how far things can move. Use that as a bound. For example, if the max speed is 0.1 units per evaluation and your bounding box is the cell (10,10)-(11,11) then you can easily determine the particles within, say, 10 units of that cell, and know that there’s no way any other particle can reach that cell for 100 evaluations.
If the coordinates change drastically each and every time, then the O(n) linear search is the best solution because any data structure (quadtree, hash grid, etc) takes at least O(n) time to set up, if only because it needs to visit each point.
2