I have a dataframe with two columns called microscopic_descriptionClean2 and cleaned_terms with the structure as follows:
The data
structure(list(cleaned_terms = c("nPOT A body nPOT B Antrum incisura body",
"nPOT A Stomach Oesophagus ", "nPOT A Antrum cnPOT B incisura cnPOT C body ",
"nPOT A nPOT B cardia cardia cardia nPOT E nPOT nPOT G",
"nPOT A Antrum Antrum nPOT B"), microscopic_descriptionClean2 = c("POT A:nPOT B:",
"POT A:", "POT A: POT B and POT C IM", "POT A: IM", "POT A: IM"
)), row.names = c(NA, -5L), class = "data.frame")
The problem
I would like to merge the columns so that a line with POT A for cleaned_terms is merged with POT A in microscopic_descriptionClean2 and POT B in cleaned_terms with POT B microscopic_descriptionClean2 etc, in its own column.
My attempt
merge_text <- function(cleaned_terms, microscopic_descriptionClean2) {
# Split cleaned_terms into POT and letter
cleaned_pots <- gsub("[[:space:]]+", "", cleaned_terms) # Remove extra spaces
cleaned_pots <- grep("^POT [A-Z]$", cleaned_pots, value = TRUE, invert = TRUE) # Remove empty entries
# Split microscopic_descriptionClean2 into POT and content
description_pots <- strsplit(microscopic_descriptionClean2, ":")[[1]]
# Initialize result
result <- character(max(length(cleaned_pots), length(description_pots)))
# Merge text line by line where POT and letter match
for (i in 1:length(cleaned_pots)) {
if (cleaned_pots[i] %in% description_pots) {
result[i] <- paste(cleaned_pots[i], description_pots[description_pots %in% cleaned_pots], collapse = ": ")
} else {
result[i] <- cleaned_pots[i]
}
}
# Combine result into a single string with newlines
merged_text <- paste(result, collapse = "n")
return(merged_text)
}
# Apply the function to create a new column in potmergednew
potmergednew$merged_text <- mapply(merge_text, potmergednew$cleaned_terms, potmergednew$microscopic_descriptionClean2, SIMPLIFY = FALSE)
However this just merges all the lines in cleaned_terms and doesnt merge in the microscopic_descriptionClean2