I’d like to combine a full join and a update join with a variable number of columns and havn’t found an answer covering this case on stackoverflow.
DT
is my initial table:
key_col value_col_1 value_col_2
<char> <int> <int>
1: a 1 11
2: b 2 12
3: c 3 13
4: d 4 14
5: e 5 15
which needs to be updated with the data in updateDT
:
key_col value_col_1 value_col_2 value_col_3
<char> <int> <int> <int>
1: d 11 21 31
2: e 12 22 32
3: f 13 23 33
4: g 14 24 34
updateDT
may add new rows and columns (variable in number) and always has priority over existing data in DT
.
The expected result (resultDT
) for the above example is:
key_col value_col_1 value_col_2 value_col_3
<char> <int> <int> <int>
1: a 1 11 NA
2: b 2 12 NA
3: c 3 13 NA
4: d 11 21 31
5: e 12 22 32
6: f 13 23 33
7: g 14 24 34
I can get there with hard coded column names, however, I’m struggling with generalizing this approach to handle e.g. a third or fourth value_col
in updateDT
dynamically:
library(data.table)
DT <- data.table(key_col = letters[1:5], value_col_1 = 1:5, value_col_2 = 11:15)
key_column <- "key_col"
value_columns <- setdiff(names(DT), key_column)
updateDT <- data.table(key_col = letters[4:7], value_col_1 = 11:14, value_col_2 = 21:24, value_col_3 = 31:34)
resultDT <- merge(DT, updateDT, by = "key_col", all = TRUE)
resultDT[, value_col_1 := fifelse(is.na(value_col_1.y), yes = value_col_1.x, no = value_col_1.y)]
resultDT[, value_col_2 := fifelse(is.na(value_col_2.y), yes = value_col_2.x, no = value_col_2.y)]
resultDT[, c("value_col_1.x", "value_col_1.y", "value_col_2.x", "value_col_2.y") := NULL]
print(resultDT)