I am running a map_dfr on a set of webpages to pull various elements that will be bundled into a data frame. I made a simple example to show the issue with just two pages to loop through. In the first page of the two, there is a “defense” table but there isn’t one in the second page of the loop. (There is an offense table for both – FYI)
I was looking up what to do to be able to keep the loop from breaking for any pages that does not include this (or other elements but I just used this one for the example). The Possibly function came up so I have incorporated that in my attempt below. The issue seems to be that the code does create “na” for the defense table that doesn’t exist in iteration #2 but when it comes time to add “defense” as a column at the bottom it gives the error included at the bottom of my code. I’m thinking this has something to do with the fact that column is a nested table but am not sure.
My goal is to run this loop by keeping the nested defense table as a column and just have any page that doesn’t contain that table to be an “NA” or blank.
DPLYR solution prefered but open to whatever works.
library(tidyverse)
library(rvest)
library(RSelenium)
library(netstat)
rs_driver_object <- rsDriver(browser = "firefox",
verbose = F,
chromever = NULL,
port = free_port())
remDr <- rs_driver_object$client
games <- c("https://www.pro-football-reference.com/boxscores/197301140mia.htm",
"https://www.pro-football-reference.com/boxscores/196010230was.htm")
remDr <- rs_driver_object$client
test_df <-
map_dfr(games,
function(game_pull){
Sys.sleep(3)
remDr$navigate(game_pull)
x <- remDr$getPageSource() %>% unlist()
page <- read_html(x)
szn <-
page |>
html_elements(xpath = "//*[@class='hoversmooth']") |>
html_text2() |>
parse_number()
offense <- map_dfr(page,
possibly(~
page |>
html_elements(xpath = "//*[@id='all_player_offense']") |>
html_table() |>
as.data.frame() |>
janitor::row_to_names(row_number = 1) |>
janitor::clean_names() |>
nest(),
otherwise = "na"
))
defense <- map_dfr(page,
possibly(~
page |>
html_elements(xpath = "//*[@id='all_player_defense']") |>
html_table() |>
as.data.frame() |>
janitor::row_to_names(row_number = 1) |>
janitor::clean_names() |>
nest(),
otherwise = "na"
))
df <- page |>
html_elements(xpath = "//table[@class='linescore nohover stats_table no_freeze']") |>
html_table() |>
as.data.frame() |>
setNames(c("trash", "team", 'q1', "q2", "q3", "q4", "final")) |>
mutate(offense = offense,
defense = defense)
df
})
Error in `map()`:
ℹ In index: 2.
Caused by error in `mutate()`:
ℹ In argument: `defense = defense`.
Caused by error:
! `defense` must be size 2 or 1, not 0.