I am trying to do web scrraping for my assignment in school but there one problem I have we are doing scraping on a cragist website and here my code
library(rvest)
library(dplyr)
Davis_url <- "https://sacramento.craigslist.org/search/apa?search_distance=10&s="
scrape_page <- function(url) {
page <- read_html(url)
titles <- page %>% html_nodes(".title") %>% html_text(trim = TRUE)
prices <- page %>% html_nodes(".price") %>% html_text(trim = TRUE)
# Handle missing square footage by checking for existence and replacing with NA
sqfts <- page %>% html_nodes(".post-sqft") %>% html_text(trim = TRUE)
if (length(sqfts) == 0) {
sqfts <- NA
}
# Handle missing bedrooms by checking for existence and replacing with NA
bedrooms <- page %>% html_nodes(".post-bedrooms") %>% html_text(trim = TRUE)
if (length(bedrooms) == 0) {
bedrooms <- NA
}
data <- data.frame(
title = titles,
price = prices,
sqft = sqfts,
bedrooms = bedrooms,
stringsAsFactors = FALSE
)
return(data)
}
all_data <- data.frame()
for (i in seq(0, 480, by = 120)) {
url <- paste0(Davis_url, i)
page_data <- scrape_page(url)
all_data <- bind_rows(all_data, page_data) # Append results to the data frame
Sys.sleep(1) # Pause for 1 second to avoid getting blocked
}
print(all_data)
however only the price title work and the sqft and bedrooms return NA
and I have try the css which it is .post-sqft but I still having trouble to return sqft now I am super confuse because I don’t know what was wrong
New contributor
Dabin Xuan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.