I have a data set wherein a column looks like this:
ABC|DEF|GHI,
ABCD|EFG|HIJK,
ABCDE|FGHI|JKL,
DEF|GHIJ|KLM,
GHI|JKLM|NO|PQRS,
BCDE|FGHI|JKL
…. and so on
I need to extract the characters that appear before the first |
symbol.
In Excel, we would use a combination of MID-SEARCH or a LEFT-SEARCH, R contains substr()
.
The syntax is – substr(x, <start>,<stop>)
In my case, start will always be 1. For stop, we need to search by |
. How can we achieve this? Are there alternate ways to do this?
1
We can use sub
sub("\|.*", "", str1)
#[1] "ABC"
Or with strsplit
strsplit(str1, "[|]")[[1]][1]
#[1] "ABC"
Update
If we use the data from @hrbrmstr
sub("\|.*", "", df$V1)
#[1] "ABC" "ABCD" "ABCDE" "DEF" "GHI" "BCDE"
These are all base R methods. No external packages used.
data
str1 <- "ABC|DEF|GHI ABCD|EFG|HIJK ABCDE|FGHI|JKL DEF|GHIJ|KLM GHI|JKLM|NO|PQRS BCDE|FGHI|JKL"
Another option word
function of stringr
package
library(stringr)
word(df1$V1,1,sep = "\|")
Data
df1 <- read.table(text = "ABC|DEF|GHI,
ABCD|EFG|HIJK,
ABCDE|FGHI|JKL,
DEF|GHIJ|KLM,
GHI|JKLM|NO|PQRS,
BCDE|FGHI|JKL")
2
with stringi
:
library(stringi)
df <- read.table(text="ABC|DEF|GHI,1
ABCD|EFG|HIJK,2
ABCDE|FGHI|JKL,3
DEF|GHIJ|KLM,4
GHI|JKLM|NO|PQRS,5
BCDE|FGHI|JKL,6", sep=",", header=FALSE, stringsAsFactors=FALSE)
stri_match_first_regex(df$V1, "(.*?)\|")[,2]
## [1] "ABC" "ABCD" "ABCDE" "DEF" "GHI" "BCDE"