ID Time Condition Seq Want_temp Want_final_v1
A 1 yes 0 0 0
A 2 yes 0 . 1
A 3 no 1 . .
A 4 yes 0 . 2
A 5 no 1 . .
A 6 no 2 . .
B 1 yes 0 . 4
B 2 no 1 . .
B 3 no 2 . .
B 4 no 3 . .
B 5 no 4 . .
B 6 yes 0 0 0
B 7 yes 0 0 0
...
This is a simplified version of a data set I have. The variable I want to generate
is Want_final_v1
.
Seq
is the numerical sequence starting from 0 when the value of Condition
is "yes"
. Seq
increases by 1 when the next value of Condition
is "no"
.
What I want is Want_final_v1
which contains 0 for Condition == "yes"
but contains the last or the max value of Seq
for Condition == "yes"
only when following values for Condition == "no"
.
My initial plan is to generate Want_final_v1
after generating Want_temp
(which I’ve done).
1
This was a fun question that seemed like it should have a simple elegant solution, but I could not find one. Interested to see if someone has a better solution.
Stata assigns values row by row going top down. This is why I am reversing the sort order to be able to carry the last “no” value down to the corresponding “yes” value.
* Example generated by -dataex-. For more info, type help dataex
clear
input str1 id byte time str3 condition byte(seq want_temp want_final_v1)
"A" 1 "yes" 0 0 0
"A" 2 "yes" 0 . 1
"A" 3 "no" 1 . .
"A" 4 "yes" 0 . 2
"A" 5 "no" 1 . .
"A" 6 "no" 2 . .
"B" 1 "yes" 0 . 4
"B" 2 "no" 1 . .
"B" 3 "no" 2 . .
"B" 4 "no" 3 . .
"B" 5 "no" 4 . .
"B" 6 "yes" 0 0 0
"B" 7 "yes" 0 0 0
end
* Reverse sort order
gsort -id -time
* Initiate want variable
gen want = .
* Get the last "no" value in each group
replace want = seq if (condition == "no" & condition[_n-1] == "yes")
* For all values that was not a last "no" value, get the value from preceding row
replace want = want[_n-1] if missing(want)
* Set "yes" obs in group with no "no" to 0
replace want = 0 if missing(want)
* Replace all "no" rows with missing
replace want = . if condition == "no"
* Restore sort order
sort id time