I have the first paragraph of Wikipedia articles from the wikifacts
package (only for people). I like to extract birth year and year of death.
library(wikifacts)
library(tidyverse)
politicians <- data.frame(
Name = c("Barack Obama", "Angela Merkel", "Nelson Mandela", "Margaret Thatcher", "Mahatma Gandhi"),
stringsAsFactors = FALSE
)
politicians <- politicians %>%
mutate(First_Paragraph = substr(wiki_define(Name), 1, 200))
head(politicians)
> head(politicians)
Name
1 Barack Obama
2 Angela Merkel
3 Nelson Mandela
4 Margaret Thatcher
5 Mahatma Gandhi
First_Paragraph
1 Barack Hussein Obama II (born August 4, 1961) is an American politician who served as the 44th president of the United States from 2009 to 2017. As a member of the Democratic Party, he was the first A
2 Angela Dorothea Merkel (German: [aŋˈɡɪːla doʁoˈteːa ˈmɛʁkl̩] ; née Kasner; born 17 July 1954) is a retired German politician who served as Chancellor of Germany from 2005 to 2021 and was the first wom
3 Nelson Rolihlahla Mandela ( man-DEH-lə; Xhosa: [xolíɬaɬa mandɛ̂ːla]; born Rolihlahla Mandela; 18 July 1918 – 5 December 2013) was a South African anti-apartheid activist, politician, and statesman who
4 Margaret Hilda Thatcher, Baroness Thatcher, (née Roberts; 13 October 1925 – 8 April 2013) was a British stateswoman and Conservative politician who was Prime Minister of the United Kingdom from 1979
5 Mohandas Karamchand Gandhi (ISO: Mōhanadāsa Karamacaṁda Gāṁdhī; 2 October 1869 – 30 January 1948) was an Indian lawyer, anti-colonial nationalist and political ethicist who employed nonviolent resista
I like to extract birth year, and if available year of death. Usually, these are the first two 4 digits that appear, or the first two 4 digits that are within the first pair of parentheses. I tried several ways of regular expressions of string extractions. What would be a nice and easy way, preferably in tidyverse
logic to get birth and death year?