Say that I have this list:
listexample = list(books = list(list(
title="Book 1",
entry = "entry 1",
publisher = "Books Unlimited",
authors = list(
list(name="bob", location="north dakota"),
list(name="susan", location="california"),
list(name="tim")),
isbn = "1358",
universities = list(
list(univ="univ1"),
list(univ="univ2"))
),
list(
title="Book 2",
entry = "entry 2",
publisher = "Books Unified",
authors = list(
list(name="tom", location="north dakota"),
list(name="sally", location="california"),
list(name="erica", location="berlin")),
isbn = "1258",
universities = list(
list(univ="univ5"),
list(univ="univ2"),
list(univ="univ99"),
list(univ="univ2"),
list(univ="univ3"))
)
),
misc = list(name="Jim Smith", location="Alaska"))
How can I create a dataframe (or tibble is also fine) where each row is an author? I want to completely ignore the second element of the main list (misc
). I also want to ignore universities
, isbn
, and publisher
. I still want to keep title
, name
, location
, as well as books
(the name of the first element of the main list).
I know that rrapply
can be used to iteratively do things, but I am not sure if it is appropriate in this case.
library(rrapply)
rrapply(listexample, how = "bind")
1
You can use unnest_longer
and unnest_wider
from tidyr
.
listexample |>
tibble::enframe() |>
dplyr::filter(name == "books") |>
tidyr::unnest_longer(value) |>
tidyr::unnest_wider(value) |>
dplyr::select(title, authors) |>
tidyr::unnest_longer(authors) |>
tidyr::unnest_wider(authors)
You can run the code adding one line at a time to see what everything does. In short, we turn the list into a two-row tibble (row one is books
, row two is misc
), then expand the nested information.
Read the tidyr
“rectangling” vignette for more information. In fact, you can probably reduce the code here by using tidyr::hoist()
.