I’m trying to scrape CFB data from
https://stathead.com/football/playerseasonfinder.cgirequest=1&match=player_season_combined&order_by=name_display_csk&year_min=2008&year_max=2024&positions%5B%5D=qb&draft_status=drafted&draft_pick_type=overall
a paid website. I’m able to to login through R and obtain the primary links (list of players and their hyperlinks), but now I’m trying to navigate to each hyperlink and obtain the url of the “College Stats” hyperlink shown here on the resulting pages (example)
https://www.profootballreference.com/players/Y/YounBr01.htm__hstc=205977932.109bbba6a8a9f532790724faa5fd5151.1714787967133.1714797301883.1714801232656.3&__hssc=205977932.16.1714801232656&__hsfp=3211688760
`library(httr)library(rvest)library(dplyr)
my_session <- session("https://stathead.com/users/login.cgi")
log_in_form <- html_form(my_session)[[1]]
fill_form <- set_values(log_in_form,username = "XXXX",password = "XXXX")
fill_form$fields[[4]]$name <- "button"
session_submit(my_session,fill_form)
url <- session_jump_to(my_session,"https://stathead.com/football/player-season-finder.cgi? request=1&match=player_season_combined&order_by=name_display_csk&year_min=2008&year_max=2024&p. ositions[]=qb&draft_status=drafted&draft_pick_type=overall")
tbl <- html_nodes(url, ‘table’)av_table <- html_table(tbl, fill = TRUE,) |>pluck(1)av_table |>as.data.frame()
av_table <- av_table |>select(Player, DrftYr)
pro_links <- url |>html_nodes(“#stats a”) |>html_attr(“href”)
av_table <- av_table |>mutate(URL = pro_links)
pro_links <- av_table$URL
get_college_link <- function(pro_link) {pro_page <- read_html(pro_link)college_stats_link <- pro_page |>html_nodes(“p:nth-child(7) a”) |>html_attr(“href”)}
college_url_column <- sapply(pro_links, FUN = get_college_link)
av_table <- av_table |>mutate(College_Stats_URLs = college_url_column)
`
i’m very new to this so apologies for the messiness. I’ve gotten various outputs upon minor tweaks. Right now if i print the college_url_column i gethttps://www.profootballreference.com/players/Y/YounBr01.htm__hstc=205977932.109bbba6a8a9f532790724faa5fd5151.1714787967133.1714797301883.1714801232656.3&__hssc=205977932.16.1714801232656&__hsfp=3211688760
“https://www.sports-reference.com/cfb/players/bryce-young-1.html”
That 2nd link is what should show up, but for each player.
Cary Lucas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.