I’ve been going over previous tech interviews I’ve had (got another one coming up).
The problem
Anyway, one question I had was…
Given 2 unsorted arrays how would you find all of the common objects between them?
Say I have array A and B. Worst case A and B are both of size n.
Initial thought
Initially my thought was to iterate A and do a linear search through B.
The complexity for this is O(n) * O(n) = O(n^2)
.
Sorting first
However, I was wondering if it would be better to sort B first.
Using a quick sort (or merge sort) on B is O(n log(n))
. This is done once.
Now you can iterate A O(n)
and do a binary search on B O(log(n))
for each A.
So the complexity is (sort) O(n log(n)) + (iterate A) O(n) * (search B) O(log(n))
which simplifies down to O(n log(n))
.
My question is. Am I right with this? I am very new to complexity analysis so wanted to check that I’m not doing anything stupid.
Best solution
Is the best solution then to sort one array first before iterating the other? You could sort both arrays and then iterate but you’re not improving O(n log(n).
Is there another better way of approaching this?
2
I don’t think that there’s a better answer for worst case complexity in the general case. You could probably only improve specific cases in which the input is somehow limited (say numbers between 1 and N). In that case you could use something like a Radix sort.
Karl Bielefeldt’s idea of converting the arrays to sets doesn’t solve the problem, it only hides it since intersecting sets (in the general case) is done in O(n^2). For instance, here’s Java’s retainAll() implementation:
public boolean retainAll(Collection<?> paramCollection) {
int i = 0;
Iterator localIterator = iterator();
while (localIterator.hasNext()) {
if (!(paramCollection.contains(localIterator.next())))
;
localIterator.remove();
i = 1;
}
return i;
}
Note the iteration on the incoming collection. For each element, the method calls contains
, which iterates over the elements of the current set:
public boolean contains(Object paramObject) {
Iterator localIterator = iterator();
if (paramObject == null)
while (true) {
if (!(localIterator.hasNext()))
break label53;
if (localIterator.next() == null)
return true;
}
while (localIterator.hasNext()) {
if (paramObject.equals(localIterator.next()))
return true;
}
label53: return false;
}
npinti’s solution of using a HashSet is equally problematic. A HashSet doesn’t guarantee O(1) fetch time in the worst case. In fact, if the hashes of all N elements collide, you’re looking at O(n) fetches/inserts. This brings us back to O(n^2) worst case time.
11
Why not use a Hash Set? Most implementations of the add()
method, which is used to inject items in the set return a boolean indicating if an item was inserted or not. Insertion will return false
if the item was already there. So your code would look like:
HashSet<int> set = {}
Array<int> items1 = ...
Array<int> items2 = ...
foreach (item in items1)
set.add(item)
foreach(item in items2)
if(set.add(item) == false)
out >> item is duplicate
This would yield a time complexity of 2n
, since hash sets have a constant O(1)
fetch time.
If I remember correctly, O(2n)
(which reduces to O(n)
) would be less than O(n log(n))
. This assumes that you do not need to optimize in terms of space.
5
You can’t possibly do better than O(n)
, since you have to examine each element at least once to ensure there is a match or not. My first choice would be to convert the arrays to sets, which is O(n)
, then take their intersection, which is O(n)
on the smallest set. In python:
set(array1) & set(array2)
If you have to do it in place, which is sometimes a restriction in these sorts of exercises, your solution is pretty good.
4