Is there a method(s) to tag “many-to-many、one-to-many and many-to-one” relationship, like in the example below?
df1 <- data.frame(
id1 = c(1, 1, 3, 4),
value1 = c("A", "B", "C", "D")
)
df2 <- data.frame(
id2 = c(1, 1, 3, 3, 4),
value2 = c("X", "Y", "Z1", "Z2", "W")
)
# Performing a left join to tag the many-to-many relationship and adding a tag variable
result <- df1 %>%
left_join(df2, by = c("id1" = "id2")) %>%
group_by(id1) %>%
mutate(
tag = if_else(n() > 1, "many-to-many", "one-to-many")
) %>%
ungroup()
result
# A tibble: 7 × 5
id1 value1 id2 value2 tag
<dbl> <chr> <dbl> <chr> <chr>
1 1 A 1 X many-to-many
2 1 A 1 Y many-to-many
3 1 B 1 X many-to-many
4 1 B 1 Y many-to-many
5 3 C 3 Z1 many-to-many
6 3 C 3 Z2 many-to-many
7 4 D 4 W one-to-many
In the example above, I have tagged the “many-to-many、one-to-many” but not many-to-one. How can I do that?
1
count_df <- left_join(
count(df1, id1), count(df2, id2),
by = c("id1" = "id2"),
relationship = "many-to-many"
)
one_or_many <- function(x) if_else(x == 1L, "one", "many")
df1 |>
left_join(df2, by = c("id1" = "id2"), relationship = "many-to-many") |>
left_join(count_df, by = "id1") |>
mutate(
tag = sprintf("%s-to-%s", one_or_many(n.x), one_or_many(n.y)),
n.x = NULL, n.y = NULL
)
# id1 value1 value2 tag
# 1 1 A X many-to-many
# 2 1 A Y many-to-many
# 3 1 B X many-to-many
# 4 1 B Y many-to-many
# 5 3 C Z1 one-to-many
# 6 3 C Z2 one-to-many
# 7 4 D W one-to-one
# adding rows to both of the dataframes, to demonstrate the # many-to-one join
df1 <- df1 |> add_row(id1 = c(5, 5), value1 = c("E", "F"))
df2 <- df2 |> add_row(id2 = 5, value2 = "Z3")
# check if there is more than one unique value
# in each group.
# note: this approach doesn't work if you have identical rows, I think!
f <- (x) unique(x) %>% {ifelse(length(.) > 1, "many", "one")}
full_join(df1, df2, by = c("id1" = "id2")) |>
mutate(tag = paste0(f(value1), "-to-", f(value2)), .by = id1)
Output:
id1 value1 value2 tag
1 1 A X many-to-many
2 1 A Y many-to-many
3 1 B X many-to-many
4 1 B Y many-to-many
5 3 C Z1 one-to-many
6 3 C Z2 one-to-many
7 4 D W one-to-one
8 5 E Z3 many-to-one
9 5 F Z3 many-to-one
When you say “many-to-one”, you mean that one row in df2
matches several rows in df1
, correct? You can identify these cases in df2
before joining and then take it into account when you generate the tag
variable:
# this counts how often each value of id2 appears in id1.
df2$m2o <- sapply(df2$id2, function(x) sum(df1$id1 == 1))
# and the same the other way.
df1$o2m <- sapply(df1$id1, function(x) sum(df2$id2 == 1))
result <- df1 %>%
left_join(df2, by = c(id1 = "id2")) %>%
group_by(id1) %>%
mutate(
tag = case_when(m2o > 1 & o2m > 1 ~ "many-to-many",
m2o > 1 ~ "many-to-one",
o2m > 1 ~ "one-to-many",
m2o == 1 & o2m == 1 ~ "one-to-one",
TRUE ~ "no-match")
) %>%
ungroup()