I have a column of 0s and 1s. I only want to keep 1s in the output column only if they end up being at least 4 rows apart.
Note that simply doing diff()
is not a solution, because this would eliminate too many 1
s. Here’s an example:
df = pd.DataFrame.from_dict({'ix':list(range(12)), 'in':[1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1]})
df['out'] = [1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0] # desred output
ix in out
0 0 1 1
1 1 0 0
2 2 0 0
3 3 1 0 # this 1 needs to become 0
4 4 1 1 # we keep this 1, because the previously kept one is sufficiently far
5 5 0 0
6 6 1 0
7 7 0 0
8 8 1 1
9 9 0 0
10 10 0 0
11 11 1 0
Intuitively it seems like it should be solved with some combination of grouping, diff
and cumsum()
, but I haven’t been able to figure it out.