I am starting Python and I came across this practice test where I have to find the smallest positive integer that does not occur in a list.
Examples:
- arr = [8, 2, 1, 4, 3, 5] # output: 6
- arr = [-3, 1, 1, -7, 5, 3] # output: 2
- arr = [-11, -1, 3] # output: 1
Here’s my solution:
def solution(A):
val = 1
while True:
try:
idx = A.index(val)
val += 1
except ValueError:
break
return val
According to my results, I only got 1 out of the 4 performance tests. Could someone help me understand why?
Emmanuel John Gerasta is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
11
You need to analyze time complexity to understand the bottle neck.
Suppose K
is the smallest integer you are looking for, and N
is the length of your list.
Then your code has complexity O(K*N)
, because for each iteration over K
you are searching the list which takes O(N)
.
Now let’s think about the best conceivable runtime (BCR). To find the integer you are inevitable (with a random list) need to iterate through the whole list, which is O(N)
. All the other parts might be dropped probably depending on the implementation. So our BCR = O(N)
. Most probably you need to target this time complexity to have all performance tests to pass.
Now to the solution. I don’t think it’s optimal, but it will definitely improve your version. You can convert list to the hashset. The search operation in hashset is O(1)
, so your solution will become O(K+N)
. The downside is that your space complexity will increase from O(1)
to O(N)
def solution(A):
B = set(A)
val = 1
while True:
if val in B:
val += 1
else:
break
return val
1
The problem is that A.index(val)
sequentially searches the list until it finds val
(or not), implicitly creating another while
loop within your while
loop, giving O(n²) running time.
An alternative approach is to convert your list
of integers to a set
before further processing.
def solution(A):
S = set(A)
val = 1
while True:
if val not in S:
return val
val += 1
The set
class is implementing using a hash table, which allows membership tests (in
or not in
) to be performed very quickly, without having to loop through all of the values in the set.
The question is “why is this solution bad.”
The answer is A.index(val) has to iterate through the list in order to find the position of val. It has to do this for each and every int, until you get an answer. This of course makes this an O(n^2) worst case solution. The correct answer will be O(n).
This is actually a pretty common problem when working with lists. If you are iterating a loop, and within that loop you use a list member that iterates over the list, then you have a double loop, even though it doesn’t look like it.
7
@KlimentMerzlyakov’s solution improves the average time complexity from O(n ^ 2) to O(n) at the cost of space having to store all the numbers in a set.
You can improve the space overhead while keeping the time complexity at O(n) by storing each positive integer as a set bit in a binary number instead. Then invert the number so the lowest set bit would be at the the smallest positive integer. You can get the lowest set bit with the algorithm from this answer.
As @Ry- pointed out in the comment, however, since integers are immutable in Python and have to be copied for each operation, storing the bits as an integer would result in a time complexity of O(n ^ 2). Instead, you can store the bits in a bytearray to perform operations in-place, and then iteratively find the index of the lowest byte with an unset bit to calculate the position of the lowest unset bit, so that each additional number would only cost 1 bit of space overhead while the overall time complexity remains at O(n):
def solution(A):
size = len(A)
b = bytearray((size >> 3) + 1)
for i in A:
if 0 < i <= size:
i -= 1
b[i >> 3] |= 1 << (i & 7)
for i, byte in enumerate(b):
if byte != 255:
break
byte = ~byte
return (i << 3) + (byte & -byte).bit_length()
so that:
for arr in (
list(range(1000)) + list(range(1001, 2000)),
[8, 2, 1, 4, 3, 5],
[-3, 1, 1, -7, 5, 3],
[-11, -1, 3]
):
print(solution(arr))
outputs:
1000
6
2
1
Demo here
5
While not the fastest solution, sorting is an option that will run in O(1) memory, O(n log n) time, assuming you can do it in-place. This may make it more practically viable than using a set for very large inputs.
def min_pos_int(arr):
arr.sort()
if arr[0] > 1: return 1
for i in range(len(arr) - 1):
if arr[i] >= 1 and arr[i + 1] > arr[i] + 1:
return arr[i] + 1
elif arr[i] < 1 and arr[i + 1] > 1:
return 1
else:
return max(1, arr[-1] + 1)
If you implement the sort yourself, you can optimize quite a bit further. For example, in a qsort, you can avoid recursing into the negative partitions, and will know the beginning of the positive side immediately.
11
So you have a list of values A
which is finite. And you have an infinite list of all integers. Your algorithm is looping through the infinite list (or very large list in Python) of integers and checking each one for inclusion in the finite list. Worst case scenario, you will need to check every possible integer even if the list A
only contains a single integer.
Try switching the algorithm around to search through the finite list. At the worst case, you only need to check the total number of integers in the given list.
4