I have a DataFrame similar to:
ID | Time | Value |
---|---|---|
1 | 100 | 1 |
1 | 200 | 1 |
1 | 300 | 2 |
1 | 400 | 2 |
1 | 500 | 2 |
1 | 600 | 3 |
1 | 700 | 3 |
1 | 800 | 4 |
1 | 900 | 4 |
1 | 1000 | 5 |
1 | 1100 | 5 |
What I would like to create is a function that accept the column name and the sequence, and return the data frame with a 1 where the sequence is found. E.g.:
detect_sequence(df, "Value", [2, 3, 4])
And this should return
ID | Time | Value | Flag |
---|---|---|---|
1 | 100 | 1 | 0 |
1 | 200 | 1 | 0 |
1 | 300 | 2 | 1 |
1 | 400 | 2 | 1 |
1 | 500 | 2 | 1 |
1 | 600 | 3 | 1 |
1 | 700 | 3 | 1 |
1 | 800 | 4 | 1 |
1 | 900 | 4 | 1 |
1 | 1000 | 5 | 0 |
1 | 1100 | 5 | 0 |
Do you think this is possible without any UDF?
I tried extracting state changes and using lag functions ore multiple windowing but it’s not working. Especially I cannot deal with the fact the sequence can have a flexible length, it should accept (2, 3, 4) but also (2, 3, 3, 3, 4, 4) and (2, 2, 2, 2, 3, 4, 4, 4)…
Thanks in advance
MichaelDes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.