Suppose that I have this code:
library(rvest)
library(dplyr)
books <- minimal_html('
<div class="metadata">
<div class="country">Andorra</div>
<div class="content">
<div class="entry">
<div class="collection">Collection 1</div>
<div class="book">
<div class="booktitle">Book 1</div>
<div class="year">1999</div>
<div class="author">
<div class="name">Author 1</div>
<div class="city">Austin</div>
</div>
<div class="author">
<div class="name">Author 2</div>
<div class="city">Dallas</div>
</div>
<div class="author">
<div class="name">Author 3</div>
<div class="city">Memphis</div>
</div>
</div>
<div class="book">
<div class="booktitle">Book 2</div>
<div class="year">2022</div>
<div class="author">
<div class="name">Author 4</div>
<div class="city">Houston</div>
</div>
</div>
</div>
<div class="entry">
<div class="collection">Collection 2</div>
<div class="book">
<div class="booktitle">Book 3</div>
<div class="year">1845</div>
<div class="author">
<div class="name">Author 5</div>
</div>
<div class="author">
<div class="name">Author 6</div>
<div class="city">Dayton</div>
</div>
<div class="author">
<div class="name">Author 7</div>
<div class="city">Philadelphia</div>
</div>
</div>
</div>
</div>')
I then want to make a table that includes the country, which will be constant for all rows (as well as the book title, year, and collection).
This code is based on answers here and here. From a prior answer, this code successfully gets the the book title, year, and collection.
I have tried to write code to get country, but it returns NA
s. What I am trying to do is to go up a couple levels from the book
level, search for class metadata
and then, within metadata, find the class country
. How can I modify this erroneous code?
books |>
html_elements(".book") |>
lapply((x) {
tibble(
country = x |> html_element(xpath = "../../div[@class='metadata']") |> html_element("country") |> html_text2(),
collection = x |> html_element(xpath = "../div[@class='collection']") |> html_text2(),
title = x |> html_element(".booktitle") |> html_text2(),
year = x |> html_element(".year") |> html_text2()
)
}) |>
bind_rows()