I’m trying to column bind two dataframes: the first is a time-series with numerous columns and ~10,000 rows. The second is a derivative of the first with correlation coefficients among other relevant vectors.
The 2nd df is obviously much shorter than the first. Both dfs have grouping columns of species
and region
.
Goal: I would like to add the results of the shorter 2nd df to the first, matching by species
and region
columns.
I’ve tried merge
, full_join
, and other join
functions but these either result in NAs
or add rows to the first df. How would I go about matching and merging these dfs in a tidy
universe?
Below is a minimum reproducible example of the dfs and desired output:
set.seed(123)
df1 <- data.frame(
region = c("one", "one", "one", "one", "one",
"one", "one", "one", "one", "one",
"one", "one", "one", "one", "one",
"two", "two", "two", "two", "two",
"two", "two", "two", "two", "two",
"two", "two", "two", "two", "two",
"three", "three", "three", "three", "three",
"three", "three", "three", "three", "three",
"three", "three", "three", "three", "three"),
species = c("A", "A", "A", "A", "A",
"B", "B", "B", "B", "B",
"C", "C", "C", "C", "C",
"A", "A", "A", "A", "A",
"B", "B", "B", "B", "B",
"C", "C", "C", "C", "C",
"A", "A", "A", "A", "A",
"B", "B", "B", "B", "B",
"C", "C", "C", "C", "C"),
value = sample(seq(from = 1, to = 45, by = 1), replace = TRUE), # abundance
date = seq(from = 1, to = 5, by = 1)
)
df2 <- data.frame(
region = c("one", "one", "one",
"two", "two", "two",
"three", "three", "three"),
species = c("A", "B", "C",
"A", "B", "C",
"A", "B", "C"),
rho = sample(seq(from = -1, to = 1, by = 0.1), size = 9, replace = TRUE),
date_max_value = sample(seq(from = 1, to = 5, by = 1), size = 9, replace = TRUE) # date of max abundance (value)
)
desired_output <- data.frame(
region = c("one", "one", "one", "one", "one",
"one", "one", "one", "one", "one",
"one", "one", "one", "one", "one",
"two", "two", "two", "two", "two",
"two", "two", "two", "two", "two",
"two", "two", "two", "two", "two",
"three", "three", "three", "three", "three",
"three", "three", "three", "three", "three",
"three", "three", "three", "three", "three"),
species = c("A", "A", "A", "A", "A",
"B", "B", "B", "B", "B",
"C", "C", "C", "C", "C",
"A", "A", "A", "A", "A",
"B", "B", "B", "B", "B",
"C", "C", "C", "C", "C",
"A", "A", "A", "A", "A",
"B", "B", "B", "B", "B",
"C", "C", "C", "C", "C"),
value = sample(seq(from = 1, to = 45, by = 1), replace = TRUE), # abundance
date = seq(from = 1, to = 5, by = 1),
rho = c(rep(-0.6, 5), rep(-0.3, 5), rep(0.1, 5), rep(-0.2, 5), rep(0.7, 5),
rep(-1.0, 5), rep(-0.5, 5), rep(1.0, 5), rep(0.4, 5)),
date_max_value = c(rep(1, 5), rep(2, 5), rep(4, 5), rep(4, 5), rep(3, 5),
rep(1, 5), rep(2, 5), rep(1, 5), rep(2, 5))
)