I want to scrape job responsibilities from the following website (particular job offer):
url = "https://www.pracuj.pl/praca/project-manager-with-german-warszawa,oferta,1003295147"
The HTML structure (tags) of the responsibilities’ section differs significantly from offer to offer, so it may contain different div, li, ul etc. tags (I’m not familiar with HTML pretty much). I want to get a text of the very last child node (in case of provided job offer it is li tag, but it can be any tag).
I use the following R code:
read_html(url) %>%
html_nodes(xpath = "//*[contains(@data-scroll-id, 'requirements')][last()]") %>%
html_nodes("*") %>%
html_text()
The R code above generally works, but it returns a vector that besides responsibilities contains a concatenation of all responsibilities (index 2) and child’s node title (don’t know how it calls properly, index 1).
I want to specify xpath so that the output contains only text of the very last child node (li tag in case of this job offer).