I have rectangular regions in a plane. I want to consistently order them in a human-friendly way that a typical y-x sort won’t work for. Basically I want (0,0), (1,0), and (0,1) to sort the same as (0,0), (1, -0.1), and (-0.1, 1).
Ideas?
From comments:
- The obvious answers, y-x and x-y sort, result in different orders for the short example I posted. The things I’m coming up with, now, are clustering approaches where I cluster the y values, sort by cluster y means, then by x.
Question: What are you sorting your rectangles for? Searching? Displaying?
- Numbering the regions, and I want two region sets a human would say are nearly identical to get numbered identically.
Question: Is the orientation of the rectangles really important (what is the difference between (0,1) and (-1,0) in the problem domain)? Would primarily sorting by area or diagonal be ok?
- I can’t tell the orientation of them beyond portrait or landscape, and size doesn’t work because a lot might be practically the same.
7
In principle, a Hilbert curve can do that. It’s a fractal that fills a two-dimensional space.
This paper defines a generalization of the Hilbert curve and uses a relationship with reflected binary Gray code to define efficient conversions, with the main application being multi-dimensional database indexes.
The key point is that a fractal can be evaluated to any level of detail. So in principle, if two different points hit the same point on a Hilbert curve at one resolution, you can just evaluate the Hilbert curve at higher and higher resolutions until you find one where the points on the Hilbert curve are different, ordering the two points based on that. Because you’re just evaluating the same curve to a higher level of detail, ordering relative to other points (assessed at lower resolution) is unchanged.
That said, this is just a the basic idea – I don’t know if it has been done (probably it has, but I don’t have a reference) or the detail of how to implement it well.
2
I realized there’s a better way to approach the problem. What I was really trying to do was match (0,0), (1,0), and (0,1) to (0,0), (1, -0.1), and (-0.1, 1), respectively. Ordering them, then matching up sequences like Python’s itertools.zip()
seemed like an elegant solution.
But what’s easier is to treat this as a clustering problem. Running K-means on the concatenated list of points where K = n/2 (3 in the case of my example above) would to the matching I needed.
1