I have been struggling the past month with this problem that was given to us on our course while I was upsolving it. The task is to find the window of size K with the least median in an array of integers. It is also worth noting that K will always be odd, so we need not be worried about even length sequences.
An example would be:
[1,3,3,2,1], K = 3
we have [1,3,3], [3,3,2], [3,2,1] and sorting this gives us [1,3,3], [2,3,3], [1,2,3] and thus our answer is 2.
I have implemented a solution that uses a sliding window technique and then sorting each window but obviously, it was the slowest one but is still correct (was only given partial points). Another solution I will attach here is where I used the bisect module, but I think using the remove method increased the time complexity of my program. I wanted to know if there is any solution to this that may have a time complexity of O(n log k) time.
from bisect import insort, bisect_left
from typing import Sequence
def min_median(s: Sequence[int], m: int) -> int:
n = len(s)
result = 9**30
window = sorted(s[:m])
mid = m // 2
result = window[mid]
for i in range(m, n):
insort(window, s[i])
del window[bisect_left(window, s[i - m])]
result = min(result, window[mid])
return result
Lesserrafim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1
To solve this in O(n log k), you need to find a way to keep finding the medians in O(log k) time since there are n – k + 1 subarrays that you need to consider. This requires a data structure where you can insert, delete and find median in O(log k) time for all operations as you slide the window through the array.
One way to do this would be to use AVL trees. For instance, you could use a modified AVL tree which also keeps track of size of the subtree at each node rather than just its height.
Alternatively, you could consider using 2 AVL trees: one for the lower half of the elements and one for the upper half. The idea is similar to the two-heaps (How to implement a Median-heap) approach but uses AVL trees for balanced and ordered data insertion and deletion. The benefit of using AVL over heaps is that it only takes O(log k) to delete from AVL, but O(k) to delete from a heap (unless you use an array to keep track of the pointers), which allows you to get a better overall time complexity.
1