I have a dataframe in R called df
:
A = c("ok","WA","WA","ok","WA")
B = c("WB","ok","ok","ok","WB")
C = c("WC","ok","WC","ok","WC")
df = tibble(A,B,C)
df
A B C
<chr> <chr> <chr>
1 ok WB WC
2 WA ok ok
3 WA ok WC
4 ok ok ok
5 WA WB WC
I want to create (mutate) a new column that will concatenate all the values that are not OK like this:
A B C D
<chr> <chr> <chr> <chr>
1 ok WB WC WB,WC
2 WA ok ok WA
3 WA ok WC WA,WC
4 ok ok ok NO W
5 WA WB WC WA,WB,WC
You could use a paste
approach with the help of gsub
:
A <- c("ok","WA","WA","ok","WA")
B <- c("WB","ok","ok","ok","WB")
C <- c("WC","ok","WC","ok","WC")
df <- data.frame(A=A, B=B, C=C, stringsAsFactors=FALSE)
df$D <- paste(df$A, df$B, df$C, sep=",")
df$D <- gsub("^,|,$", "", gsub(",{2,}", ",", gsub("\bok\b", ",", df$D)))
df
A B C D
1 ok WB WC WB,WC
2 WA ok ok WA
3 WA ok WC WA,WC
4 ok ok ok
5 WA WB WC WA,WB,WC
The basic strategy here is to strip out ok
entries, then cleanup the possible dangling commas left from the paste
call.
3
Using apply
and paste
df['D'] <- apply(df, 1, (x) paste(x[which(x!="ok")], collapse=","))
df
# A tibble: 5 × 4
A B C D
<chr> <chr> <chr> <chr>
1 ok WB WC "WB,WC"
2 WA ok ok "WA"
3 WA ok WC "WA,WC"
4 ok ok ok ""
5 WA WB WC "WA,WB,WC"
A non-vectorised alternative using apply
row-wisely and toString
with nchar
. This might be fast enough on moderate sized data.
df0 = data.frame(A = c("ok","WA","WA","ok","WA"),
B = c("WB","ok","ok","ok","WB"),
C = c("WC","ok","WC","ok","WC"))
df0$D = apply(df0, 1, (i) {x = sub("ok", "", i); toString(x[nchar(x)>1L]) })
df0$D[df0$D==""] = "NO W"
df0
# A B C D
# 1 ok WB WC WB, WC
# 2 WA ok ok WA
# 3 WA ok WC WA, WC
# 4 ok ok ok NO W
# 5 WA WB WC WA, WB, WC
Using dplyr/tidyr, replace “ok” with NA, then unite excluding NA, then column bind to original data.frame:
library(dplyr)
library(tidyr)
df %>%
mutate(across(A:C, ~if_else(.x == "ok", NA, .x))) %>%
unite(D, A:C, sep = ",", na.rm = TRUE) %>%
bind_cols(df, .)
# A B C D
# 1 ok WB WC WB,WC
# 2 WA ok ok WA
# 3 WA ok WC WA,WC
# 4 ok ok ok
# 5 WA WB WC WA,WB,WC