I am looking for a nice way of flattening (splitting) a list of potentially-overlapping numeric ranges. The problem is very similar to that of this question: Fastest way to split overlapping date ranges, and many others.
However, the ranges are not only integers, and I am looking for a decent algorithm that can be easily implemented in Javascript or Python, etc.
Example Data:
Example Solution:
Apologies if this is a duplicate, but I am yet to find a solution.
4
Walk from left to right, using a stack to keep track of what color you’re on. Instead of a discrete map, use the 10 numbers in your dataset as the break-points.
Starting with an empty stack, and setting start
to 0, loop until we reach the end:
- If stack is empty:
- Look for the first color starting at or after
start
, and push it and all lower-ranked colors onto the stack. In your flattened list, mark the beginning of that color.
- Look for the first color starting at or after
- else (If not empty):
- Find next beginning point for any higher-ranked color at or after
start
, and find the end of the current color- If the next color starts first, push it and anything else on the way to it onto the stack. Update the end of the current color as the beginning of this one, and add the start of this color to the flattened list.
- If there are none and the current color ends first, set
start
to the end of this color, pop it off of the stack, and check the next-highest ranked color- If
start
is within the next color’s range, add this color to the flattened list, starting atstart
. - If the stack empties, just continue the loop (go back to the first bullet point).
- If
- Find next beginning point for any higher-ranked color at or after
This is a mental run-through given your example data:
# Initial data.
flattened = []
stack = []
start = 0
# Stack is empty. Look for the next starting point at 0 or later: "b", 0 - Push it and all lower levels onto stack
flattened = [ (b, 0, ?) ]
stack = [ r, b ]
start = 0
# End of "b" is 5.4, next higher-colored start is "g" at 2 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, ?) ]
stack = [ r, b, g ]
start = 2
# End of "g" is 12, next higher-colored start is "y" at 3.5 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, ?) ]
stack = [ r, b, g, y ]
start = 3.5
# End of "y" is 6.7, next higher-colored start is "o" at 6.7 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, ?) ]
stack = [ r, b, g, y, o ]
start = 6.7
# End of "o" is 10, and there is nothing starting at 12 or later in a higher color. Next off stack, "y", has already ended. Next off stack, "g", has not ended. Delimit and continue.
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, ?) ]
stack = [ r, b, g ]
start = 10
# End of "g" is 12, there is nothing starting at 12 or later in a higher color. Next off stack, "b", is out of range (already ended). Next off stack, "r", is out of range (not started). Mark end of current color:
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12) ]
stack = []
start = 12
# Stack is empty. Look for the next starting point at 12 or later: "r", 12.5 - Push onto stack
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12), (r, 12.5, ?) ]
stack = [ r ]
start = 12
# End of "r" is 13.8, and there is nothing starting at 12 or higher in a higher color. Mark end and pop off stack.
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12), (r, 12.5, 13.8) ]
stack = []
start = 13.8
# Stack is empty and nothing is past 13.8 - We're done.
4
This solution seems the simplest. (Or at least, the easiest to grasp)
All that is needed is a function to subtract two ranges. In other words, something that will give this:
A ------ A ------ A ----
B ------- and B ------ and B ---------
= ---- = ---- = --- --
Which is simple enough. Then you can simply iterate through each of the ranges, starting from the lowest, and for each one, subtract from it all the ranges above it, in turn. And there you have it.
Here is an implementation of the range subtractor in Python:
def subtractRanges((As, Ae), (Bs, Be)):
'''SUBTRACTS A FROM B'''
# e.g, A = ------
# B = -----------
# result = -- ---
# Returns list of new range(s)
if As > Be or Bs > Ae: # All of B visible
return [[Bs, Be]]
result = []
if As > Bs: # Beginning of B visible
result.append([Bs, As])
if Ae < Be: # End of B visible
result.append([Ae, Be])
return result
Using this function, the rest can be done like this:
(A ‘span’ means a range, as ‘range’ is a Python keyword)
spans = [["red", [12.5, 13.8]],
["blue", [0.0, 5.4]],
["green", [2.0, 12.0]],
["yellow", [3.5, 6.7]],
["orange", [6.7, 10.0]]]
i = 0 # Start at lowest span
while i < len(spans):
for superior in spans[i+1:]: # Iterate through all spans above
result = subtractRanges(superior[1], spans[i][1])
if not result: # If span is completely covered
del spans[i] # Remove it from list
i -= 1 # Compensate for list shifting
break # Skip to next span
else: # If there is at least one resulting span
spans[i][1] = result[0]
if len(result) > 1: # If there are two resulting spans
# Insert another span with the same name
spans.insert(i+1, [spans[i][0], result[1]])
i += 1
print spans
This gives [['red', [12.5, 13.8]], ['blue', [0.0, 2.0]], ['green', [2.0, 3.5]], ['green', [10.0, 12.0]], ['yellow', [3.5, 6.7]], ['orange', [6.7, 10.0]]]
, which is correct.
2
If the data really is similar in scope to your sample data, you could create a map like this:
map = [0 .. 150]
for each color:
for loc range start * 10 to range finish * 10:
map[loc] = color
Then just walk through this map to generate the ranges
curcolor = none
for loc in map:
if map[loc] != curcolor:
if curcolor:
rangeend = loc / 10
make new range
rangecolor = map[loc]
rangestart = loc / 10
To work, the values have to be in a relatively small range as in your sample data.
Edit: to work with true floats, use the map to generate a high level mapping and then refer to the original data to create the boundaries.
map = [0 .. 15]
for each color:
for loc round(range start) to round(range finish):
map[loc] = color
curcolor = none
for loc in map
if map[loc] != curcolor:
make new range
if loc = round(range[map[loc]].start)
rangestart = range[map[loc]].start
else
rangestart = previous rangeend
rangecolor = map[loc]
if curcolor:
if map[loc] == none:
last rangeend = range[map[loc]].end
else
last rangeend = rangestart
curcolor = rangecolor
2
Here’s a relatively straightforward solution in Scala. It shouldn’t be too difficult to port to another language.
case class Range(name: String, left: Double, right: Double) {
def overlapsLeft(other: Range) =
other.left < left && left < other.right
def overlapsRight(other: Range) =
other.left < right && right < other.right
def overlapsCompletely(other: Range) =
left <= other.left && right >= other.right
def splitLeft(other: Range) =
Range(other.name, other.left, left)
def splitRight(other: Range) =
Range(other.name, right, other.right)
}
def apply(ranges: Set[Range], newRange: Range) = {
val left = ranges.filter(newRange.overlapsLeft)
val right = ranges.filter(newRange.overlapsRight)
val overlaps = ranges.filter(newRange.overlapsCompletely)
val leftSplit = left.map(newRange.splitLeft)
val rightSplit = right.map(newRange.splitRight)
ranges -- left -- right -- overlaps ++ leftSplit ++ rightSplit + newRange
}
val ranges = Vector(
Range("red", 12.5, 13.8),
Range("blue", 0.0, 5.4),
Range("green", 2.0, 12.0),
Range("yellow", 3.5, 6.7),
Range("orange", 6.7, 10.0))
val flattened = ranges.foldLeft(Set.empty[Range])(apply)
val sorted = flattened.toSeq.sortBy(_.left)
sorted foreach println
apply
takes in a Set
of all the ranges that have already been applied, finds the overlaps, then returns a new set minus the overlaps and plus the new range and the newly split ranges. foldLeft
repeatedly calls apply
with each input range.
Just keep a set of ranges sorted by start. Add range that covers everything (-oo..+oo). To add a range r:
let pre = last range that starts before r starts
let post = earliest range that starts before r ends
now iterate from pre to post: split ranges that overlap, remove ranges that are covered, then add r