I need to merge two data frames fuzzily on a Starttime_ms
column. The data to be merged however have internal structure, i.e. groupings. They should be merged by the grouping variable File
. How can this be done? Using fuzzyjoin
s function difference_left_join
this does not seem possible: in row #13, the Quote
“this is how” should not be assigned as the File
s in df1
and df2
are different:
library(fuzzyjoin)
df1 %>%
group_by(File) %>%
# join:
difference_left_join(x = .,
y = df2,
by = "Starttime_ms",
max_dist = 1500)
# A tibble: 20 × 6
# Groups: File.x [5]
File.x Utterance Starttime_ms.x File.y Quote Starttime_ms.y
<chr> <chr> <int> <chr> <chr> <int>
1 F01 "((34_m: x))" 1126730 NA NA NA
2 F01 "((68_m: in, points to r w: index @lorperi))" 2177901 NA NA NA
3 F12 "((130_m:@rct))" 2341143 NA NA NA
4 F12 "((175_m: r h closed palm down))" 2457686 NA NA NA
5 F05 "((19_m: from ctct to rct)) " 905860 F05 is (name ID08.A) 905899
6 F12 "((166_m: b h half closed palms down))" 2414528 NA NA NA
7 F12 "((106_m:@lperi))" 2296469 NA NA NA
8 F06 "((100_m: r h open palm up @uprperi))" 2511953 NA NA NA
9 F12 "((116_m:@rct))" 2309468 NA NA NA
10 F12 "((21_m: l h closed palm up))" 795914 NA NA NA
11 F01 "((148_m:r hand moves sideways with 2 beats @rperi))" 2163502 NA NA NA
12 F12 "((10_m: lf, points l then r w))" 779882 NA NA NA
13 F01 "((144_m:r rotate with palms))" 2159122 F07 this is how 2157727
14 F06 "((115_m:@uprperi))" 2642130 NA NA NA
15 F01 "((185_m: l h close palm down))" 2251525 NA NA NA
16 F09 "((11_m: flaps h outwards @rct))" 1436092 NA NA NA
17 F01 "((81_m: r h ))" 220038 F01 the EXAM'S OPEN for… 220055
18 F01 "((32_m: projects hands forward @ct))" 143758 NA NA NA
19 F05 "((61_m: wipe from ctct to l + r periphery))" 2284146 NA NA NA
20 F01 "((114_m:ard with palm upward))" 808493 NA NA NA
Any help with this is appreciated!
Data:
df2 <- structure(list(File = c("F01", "F01", "F05", "F09", "F17", "F07",
"F17", "F01", "F01", "F05", "F16", "F05", "F12", "F07", "F01",
"F06", "F17", "F16", "F16", "F07", "F05", "F05", "F12", "F01",
"F07", "F17", "F08", "F01", "F17", "F07", "F08", "F17", "F16",
"F17", "F01", "F12", "F01", "F16", "F08", "F09"),
Quote = c("the EXAM'S OPEN for a:ll but",
"yeah I watered them with beer",
"is (name ID08.A)",
">do we wanna do this< or !not!",
"°this thing that I do really°",
"well we don't wanna do this anymore",
"!no! I came from a very homogenous place",
"I [need all of my !money!] to live",
"[okay ((name Mr)) look at this",
"°I'm not motivated that [enough?°]",
"we 're not going to (.)",
"↑ye:s of course you can go shopping",
"go in",
"this is how",
"NO:: LEAVE HER AL- ((v: grunts))",
"↓WHAT¿",
"↑you're not↑ from there°",
"I 'm I 'm just here to get my paycheck",
"Oh and I don't l(h)ike [i(h)t]",
"o:h !no!",
"let's !go!", "[just !think! about it]",
"°maybe you should just get on a plane and come home",
"ZEpark",
"!Ice!land",
"[let me in] the !fucking! [toilet] ",
"°↑oh↑ it's just another day in Philly.°",
"↑↑oh I [wanna] go play",
"°↑yeah my French isn't good↑° enough", "°f- o:h !shit!° ",
"<professionalization,> [that]",
"yo man",
"no I do need, <I will need to:> cut your ↑pay↑",
"like ↑how could it",
"bitch",
"you know your rank",
"a pack of cigarettes",
"shout out",
"rightio", "so WHAT?"), Starttime_ms = c(220055L,
2100242L, 905899L, 1402070L, 1994545L, 2429302L, 1764229L, 847000L,
1601962L, 903107L, 1453737L, 1619669L, 2666162L, 2157727L, 257988L,
461057L, 2229889L, 2636779L, 2305919L, 83699L, 908779L, 133598L,
2038891L, 28300L, 783511L, 1287759L, 2268022L, 251412L, 1231539L,
1451699L, 1297216L, 1189569L, 1308988L, 1134906L, 825516L, 2233996L,
250552L, 2356355L, 1864314L, 222508L)), class = "data.frame", row.names = c(NA, -40L))
df1 <- structure(list(File = c("F01", "F01", "F12", "F12", "F05", "F12",
"F12", "F06", "F12", "F12", "F01", "F12", "F01", "F06", "F01",
"F09", "F01", "F01", "F05", "F01"), Utterance = c("((34_m: x))",
"((68_m: in, points to r w: index @lorperi))",
"((130_m:@rct))", "((175_m: r h closed palm down))",
"((19_m: from ctct to rct)) ", "((166_m: b h half closed palms down))",
"((106_m:@lperi))", "((100_m: r h open palm up @uprperi))",
"((116_m:@rct))", "((21_m: l h closed palm up))",
"((148_m:r hand moves sideways with 2 beats @rperi))",
"((10_m: lf, points l then r w))",
"((144_m:r rotate with palms))",
"((115_m:@uprperi))", "((185_m: l h close palm down))",
"((11_m: flaps h outwards @rct))", "((81_m: r h ))",
"((32_m: projects hands forward @ct))",
"((61_m: wipe from ctct to l + r periphery))",
"((114_m:ard with palm upward))"
), Starttime_ms = c(1126730L, 2177901L, 2341143L, 2457686L, 905860L,
2414528L, 2296469L, 2511953L, 2309468L, 795914L, 2163502L, 779882L,
2159122L, 2642130L, 2251525L, 1436092L, 220038L, 143758L, 2284146L,
808493L)), row.names = c(NA, -20L), class = c("tbl_df", "tbl",
"data.frame"))