I have an unsorted array. I have queries in which I give a range and then the maximum value from that range has to returned.
For example:
array[]={23,17,9,45,78,2,4,6,90,1};
query(both inclusive): 2 6
answer: 78
Which algorithm or data structure do I construct to quickly retrieve maximum value from any range. (There are a lot of queries)
EDIT:
This is indeed a simple version of the actual problem. I can have the array size as large as 100000 and the number of queries upto 100000. So I definitely require some preprocessing which’ll facilitate a fast query response.
9
I think you could construct some kind of binary tree where each node represents the maximum value its children:
78
45 78
23 45 78 6
23 17 9 45 78 2 4 6
Then you only need to find a way to determine which nodes you minimally need to check to find the maximum value in the range queried. In this example, to get the maximum value in the index range [2, 6]
(inclusive) you would have max(45, 78, 4)
instead of max(9, 45, 78, 2, 4)
. As the tree grows, the gain will be larger.
10
To complement ngoaho91’s answer.
The best way to solve this problem is using the Segment Tree data structure. This allows you to answer such queries in O(log(n)), that would mean the total complexity of your algorithm would be O(Qlogn) where Q is the number of queries. If you used the naive algorithm, the total complexity would be O(Qn) which is obviouslly slower.
There is, however, a drawback of the usage of Segment Trees. It takes up a lot of memory, but a lot of times you care less about memory than about speed.
I will briefly describe the algorithms used by this DS:
The segment tree is just an special case of a Binary Search Tree, where every node holds the value of the range it is assigned to. The root node, is assigned the range [0, n]. The left child is assigned the range [0, (0+n)/2] and the right child [(0+n)/2+1, n]. This way the tree will be built.
Create Tree:
/*
A[] -> array of original values
tree[] -> Segment Tree Data Structure.
node -> the node we are actually in: remember left child is 2*node, right child is 2*node+1
a, b -> The limits of the actual array. This is used because we are dealing
with a recursive function.
*/
int tree[SIZE];
void build_tree(vector<int> A, int node, int a, int b) {
if (a == b) { // We get to a simple element
tree[node] = A[a]; // This node stores the only value
}
else {
int leftChild, rightChild, middle;
leftChild = 2*node;
rightChild = 2*node+1; // Or leftChild+1
middle = (a+b) / 2;
build_tree(A, leftChild, a, middle); // Recursively build the tree in the left child
build_tree(A, rightChild, middle+1, b); // Recursively build the tree in the right child
tree[node] = max(tree[leftChild], tree[rightChild]); // The Value of the actual node,
//is the max of both of the children.
}
}
Query Tree
int query(int node, int a, int b, int p, int q) {
if (b < p || a > q) // The actual range is outside this range
return -INF; // Return a negative big number. Can you figure out why?
else if (p >= a && b >= q) // Query inside the range
return tree[node];
int l, r, m;
l = 2*node;
r = l+1;
m = (a+b) / 2;
return max(query(l, a, m, p, q), query(r, m+1, b, p, q)); // Return the max of querying both children.
}
If you need further explanation, just let me know.
BTW, Segment Tree also supports update of a single element or a range of elements in O(log n)
2
The best algorithm would be in O(n) time as below
let start, end be the index of the bounds of range
int findMax(int[] a, start, end) {
max = Integer.MIN; // initialize to minimum Integer
for(int i=start; i <= end; i++)
if ( a[i] > max )
max = a[i];
return max;
}
4
The binary tree/segment tree-based solutions are indeed pointing in the right direction. One might object that they require a lot of extra memory, however. There are two solutions to these problems:
- Use an implicit data structure instead of a binary tree
- Use an M-ary tree instead of a binary tree
The first point is that because the tree is highly structured, you can use a heap-like structure to implicitly define the tree rather than representing the tree with nodes, left and right pointers, intervals etc.. That saves a lot of memory with essentially no performance hit – you do need to perform a little more pointer arithmetic.
The second point is that, at the cost of a little more work during evaluation, you can use an M-ary tree rather than a binary tree. For instance if you use a 3-ary tree you will compute the max of 3 elements at a time, then 9 elements at a time, then 27, etc. The extra storage required is then N/(M-1) – you can prove using the geometric series formula. If you choose M = 11, for example, you will require 1/10th the storage of the binary tree method.
You can verify that these naive and optimized implementations in Python give the same results:
class RangeQuerier(object):
#The naive way
def __init__(self):
pass
def set_array(self,arr):
#Set, and preprocess
self.arr = arr
def query(self,l,r):
try:
return max(self.arr[l:r])
except ValueError:
return None
vs.
class RangeQuerierMultiLevel(object):
def __init__(self):
self.arrs = []
self.sub_factor = 3
self.len_ = 0
def set_array(self,arr):
#Set, and preprocess
tgt = arr
self.len_ = len(tgt)
self.arrs.append(arr)
while len(tgt) > 1:
tgt = self.maxify_one_array(tgt)
self.arrs.append(tgt)
def maxify_one_array(self,arr):
sub_arr = []
themax = float('-inf')
for i,el in enumerate(arr):
themax = max(el,themax)
if i % self.sub_factor == self.sub_factor - 1:
sub_arr.append(themax)
themax = float('-inf')
return sub_arr
def query(self,l,r,level=None):
if level is None:
level = len(self.arrs)-1
if r <= l:
return None
int_size = self.sub_factor ** level
lhs,mid,rhs = (float('-inf'),float('-inf'),float('-inf'))
#Check if there's an imperfect match on the left hand side
if l % int_size != 0:
lnew = int(ceil(l/float(int_size)))*int_size
lhs = self.query(l,min(lnew,r),level-1)
l = lnew
#Check if there's an imperfect match on the right hand side
if r % int_size != 0:
rnew = int(floor(r/float(int_size)))*int_size
rhs = self.query(max(rnew,l),r,level-1)
r = rnew
if r > l:
#Handle the middle elements
mid = max(self.arrs[level][l/int_size:r/int_size])
return max(max(lhs,mid),rhs)
try “segment tree” data structure
there are 2 step
build_tree() O(n)
query(int min, int max) O(nlogn)
http://en.wikipedia.org/wiki/Segment_tree
edit:
you guys just don’t read the wiki i sent!
this algorithm is:
– you traverse the array 1 time to build tree. O(n)
– next 100000000+ times you want to know max of any part of array, just call the query function. O(logn) for every query
– c++ implement here geeksforgeeks.org/segment-tree-set-1-range-minimum-query/
old algorithm is:
every query, just traverse the selected area and find.
so, if you gonna use this algorithm to process once, OK, it slower than old way.
but if you gonna process huge number of query(billion), it’s very efficient
you can generate text file like this, for test
line 1: 50000 random number from 0-1000000, split by ‘(space)'(it’s the array)
line 2: 2 random number from 1 to 50000, split by ‘(space)'(it’s the query)
…
line 200000: likes line 2, it’s random query too
this is the example problem, sorry but this is in vietnamese
http://vn.spoj.com/problems/NKLINEUP/
if you solve it by old way, you never pass.
8
You can achieve O(1) per query (with O(n log n) construction) using data structure called sparse table. For each power of 2 let’s save maximum for each segment of this length. Now given segment [l, r) you get maximum of maximums on [l+2^k) and [r-2^k,r) for appropriate k. They overlap but it’s OK