I’m using stringr in R to try and extract the date from a programme title.
The strings I’m trying to extract from take two forms “The Anonymous Project 2021/22” or “The Anonymous Project (2020/21)”.
I’ve written numerous regular expressions using online tools that say they should work, but I consistently get the output “2020/20” or “2021/20”.
See below for some of the things I’ve tried, all of which return either NA or “2020/20”
I’m absolutely baffled as to what I’m doing wrong. I’m aware there are much simpler ways of getting this data, but I’d really like to understand how I would achieve this, and what’s wrong with my current approach.
My first attempt
Test <- str_extract(df$`title`,'([0-9]{4}[/][0-9]{2})')
Returns “2020/20” and “2021/20”
Grouping the slash with the 2 digit element
Test <- str_extract(df$`title`,'([0-9]{4}([/][0-9]{2}))'))
Returns “2020/20” and “2021/20”
Were the brackets messing me up?
Test <- str_extract(df$`title`,'[0-9]{4}[/][0-9]{2}')
Returns “2020/20” and “2021/20”
Digits instead of the range
Test <- str_extract(df$`title`,'([:digit:]{4}[/][:digit:]{2})')
Returns “2020/20” and “2021/20”
Start and end string markers, just gives me an NA
Test <- str_extract(df$`title`,'^([0-9]{4}[/][0-9]{2})$')
Digby is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.