I have a data frame with columns of dates. The columns date_ind
represent the column index for the date_col
s. I want to compare pairs of date columns for each row specified by date_ind
. Specifically, I want to find the earlier date between the pair and get the corresponding date_ind
value for the earlier date.
For example,
- 1st row:
date_ind1
= 1 anddate_ind3
= 4, so I comparedate_col1
anddate_col4
, and decide thatdate_col4
is the earlier date andearlier_date_ind
should = 4 - 2nd row:
date_ind1
= 1 anddate_ind2
= 2, so I comparedate_col1
anddate_col2
, and decide thatdate_col2
is the earlier date andearlier_date_ind
should = 2
… and so on.
For ties, I want to choose an index that’s smaller. e.g. row 4 has equal dates, and the earlier date index should be 1.
So far, I’ve tried getting the earlier dates based on this rule but wasn’t sure how to go about extracting the corresponding date_ind
values. Any help would be appreciated.
Data
set.seed(123)
df <- data.frame(
date_ind1 = c(1, 1, 3, 1, 1, 3, 2, 1),
date_ind2 = c(NA, 2, NA, NA, 2, NA, 3, 4),
date_ind3 = c(4, NA, 1, 2, NA, 1, NA, NA),
date_col1 = sample(seq(as.Date("2000-01-01"), as.Date("2000-1-31"), by="day"), 8),
date_col2 = sample(seq(as.Date("2000-01-01"), as.Date("2000-1-31"), by="day"), 8),
date_col3 = sample(seq(as.Date("2000-01-01"), as.Date("2000-1-31"), by="day"), 8),
date_col4 = sample(seq(as.Date("2000-01-01"), as.Date("2000-1-31"), by="day"), 8)
)
> df
date_ind1 date_ind2 date_ind3 date_col1 date_col2 date_col3 date_col4
1 1 NA 4 2000-01-31 2000-01-11 2000-01-27 2000-01-10
2 1 2 NA 2000-01-15 2000-01-05 2000-01-25 2000-01-09
3 3 NA 1 2000-01-19 2000-01-20 2000-01-28 2000-01-19
4 1 NA 2 2000-01-14 2000-01-14 2000-01-30 2000-01-04
5 1 2 NA 2000-01-03 2000-01-22 2000-01-09 2000-01-14
6 3 NA 1 2000-01-10 2000-01-25 2000-01-03 2000-01-17
7 2 3 NA 2000-01-18 2000-01-30 2000-01-08 2000-01-11
8 1 4 NA 2000-01-22 2000-01-19 2000-01-07 2000-01-07
tried:
library(dplyr)
col_names1 <- paste0("date_col", df$date_ind1)
col_names2 <- paste0("date_col", coalesce(df$date_ind2, df$date_ind3))
df$earlier_date <-
sapply(1:nrow(df),
function(i) min(df[ i, col_names1[i] ], df[ i, col_names2[i] ]) ) |>
as.Date()
want:
date_ind1 date_ind2 date_ind3 date_col1 date_col2 date_col3
1 1 NA 4 2000-01-31 2000-01-11 2000-01-27
2 1 2 NA 2000-01-15 2000-01-05 2000-01-25
3 3 NA 1 2000-01-19 2000-01-20 2000-01-28
4 1 NA 2 2000-01-14 2000-01-14 2000-01-30
5 1 2 NA 2000-01-03 2000-01-22 2000-01-09
6 3 NA 1 2000-01-10 2000-01-25 2000-01-03
7 2 3 NA 2000-01-18 2000-01-30 2000-01-08
8 1 4 NA 2000-01-22 2000-01-19 2000-01-07
date_col4 earlier_date earlier_date_ind
1 2000-01-10 2000-01-10 4
2 2000-01-09 2000-01-05 2
3 2000-01-19 2000-01-19 3
4 2000-01-04 2000-01-14 1
5 2000-01-14 2000-01-03 1
6 2000-01-17 2000-01-03 3
7 2000-01-11 2000-01-08 3
8 2000-01-07 2000-01-07 4