I am looking to use rvest to scrape the ‘Monthly Cargo Volumes’ data on https://www.panynj.gov/port/en/our-port/facts-and-figures.html. I believe I had the correct xpath but i am getting no results. Thank you.
library(rvest)
library(dplyr)
url <- 'https://www.panynj.gov/port/en/our-port/facts-and-figures.html'
schedule <- url %>%
read_html() %>%
html_nodes(xpath = '/html/body/div[1]/div/div/div[2]/div[62]/div[1]/div/div[1]/div/div[1]/div[5]/div[1]/div[2]/div[1]/div/div/div/div[1]/div/table/tbody/tr[3]/td[2]') %>%
html_table() %>%
data.frame
Result:
data frame with 0 columns and 0 rows
kaiburro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
The first issue is that the website isn’t static but uses javascript to create the tables you want to scrape. To tackle this issue use read_html_live
instead of read_html
(which requires you have Chrome installed on your machine). Second, even with that change you will get an empty dataframe as you target a single table cell with your xpath expression instead of a table. As a result html_table
will return nothing as the cell does not contain a table. Instead, to get the content of a single cell you can use html_text
:
library(rvest)
url <- "https://www.panynj.gov/port/en/our-port/facts-and-figures.html"
schedule <- read_html_live(url) |>
html_elements(
xpath = "/html/body/div[1]/div/div/div[2]/div[62]/div[1]/div/div[1]/div/div[1]/div[5]/div[1]/div[2]/div[1]/div/div/div/div[1]/div/table/tbody/tr[3]/td[2]"
) |>
html_text()
schedule
#> [1] " 3,737,112"
2