I’m trying to download a parameterized .csv
file from a page. I’m quite new to scraping but it seems to me that I need to fill out a form with desired values and then submit it to receive a response (the file itself).
So I’ve been trying to make it happen using rvest
– but have not succeeded so far. Three similar approaches were used, returning two different error messages.
url <- 'https://www.anbima.com.br/informacoes/est-termo/default.asp'
# read_html
sess <- read_html(url)
form <- sess %>%
html_form() %>%
.[[1]] %>%
html_form_set(escolha = 2, Idioma = "PT", saida = "csv",
Dt_Ref = "13/06/2024")
resp <- html_form_submit(form)
read_html(resp)
# > Warning message: In session_set_response(x, resp) : Internal Server Error (HTTP 500).
# read_html_live
sess <- read_html_live(url)
form <- sess %>%
html_elements("form") %>%
html_form() %>%
.[[1]] %>%
html_form_set(escolha = 2, Idioma = "PT", saida = "csv",
Dt_Ref = "13/06/2024")
resp <- html_form_submit(form)
# > Error in curl::curl_fetch_memory(url, handle = handle) : Could not resolve host: CZ.asp
# session
sess <- session(url)
form <- sess %>%
read_html() %>%
html_form() %>%
.[[1]] %>%
html_form_set(escolha = 2, Idioma = "PT", saida = "csv",
Dt_Ref = "13/06/2024")
resp <- session_submit(sess, form)
# > Warning message: In session_set_response(x, resp) : Internal Server Error (HTTP 500).
Additionally, it appears the form when submitted on the page goes directly to a js function called ‘VerificaSubmit()’. Researching the topic led me to find this SO post where in the comments was said that ‘rvest
cannot execute javascript’, at the same time as the penultimate comment left in the air that a solution was possible.
My question: is it possible to scraping sites like this one using rvest
or do I need other package?
Thanks!