i’m struggling on codes that worked well since this morning. I would like to create a dummy variable in this way:
for the group of observations identified by NCLP variable, it should take value 1 when variable Data_Fine_Effettiva is not missing, and when it is not missing I want to select the max value between all observations. Moreover, if Data_Fine_Effettiva is missing, my dummy has to be 1 when the max lagged value of Data_Inizio_Effettiva is not missing and smaller than a threshold (my_date).
In all the other case my dummy should be 0.
Here an example of my data, where the last column is the dummy I would like to have:
Here the codes the community helped me to write:
df <- df%>%
mutate(
.by = NCLP,
dummy_data = case_when(
any(!is.na(Data_Fine_Effettiva)) & Data_Fine_Effettiva == max(Data_Fine_Effettiva, na.rm = TRUE) ~ 1,
all(is.na(Data_Fine_Effettiva)) & !is.na(lag(Data_Inizio_Effettiva)) & Data_Inizio_Effettiva == max(Data_Inizio_Effettiva) & Data_Inizio_Effettiva < my_date ~ 1,
.default=0)) %>%
Today it occurred this error.:
Error in `mutate_cols()`:
! Problem with `mutate()` column `dummy_data`.
i `dummy_data = case_when(...)`.
x Case 3 (`any(!is.na(Data_Fine_Effettiva)) & Data_Fine_Effettiva == max(Data_Fine_Effettiva, na...`) must be a two-sided formula, not a double vector.
Caused by error in `abort_case_when_formula()`:
! Case 3 (`any(!is.na(Data_Fine_Effettiva)) & Data_Fine_Effettiva == max(Data_Fine_Effettiva, na...`) must be a two-sided formula, not a double vector.
Does anyone know how to solve the issue?
3