I have a working data / web scrape function in R using HTTR and POST to scrape public data on a number of police statistics, but the code only seems to work sporadically. I suspect this is the result of cookies, but so far my attempts to add any code that would preserve or reset cookies hasn’t fixed things. The following code has worked repeatedly to collect information on one instance, but then fails to collect the same data subsequently (this would eventually iterate on the unique identifier within “payload” after “values,” here with the example ID 940216):
require(httr)
url <- "http://oip.nypdonline.org/api/reports/2042/datasource/list"
payload <- '{filters: [{key: "@TAXID", label: "TAXID", values: ["940216"]}]}'
response <- httr::POST(url, content_type("application/json"), body = payload, encode = "json")
json <- httr::content(response, as = "text")
jsonview <- jsonlite::fromJSON(json)
View(jsonview)
Running this code in my current environment returns an error code, “Error trying to save Entity changes – The Username field is required.” Yet I have seen this code work and return exactly the values I’m interested in, and if I restart R altogether I can sometimes get it to again return the right values; when it does work, if I run the same code again immediately I get the same error. I’ve tried using “handles,” and for experiment I’ve tried pulling out HTTR and then bringing it back in, but this doesn’t appear to clear whatever issue keeps the code from working a second time in the same session.
Using the example https://oip.nypdonline.org/view/1/@TAXID=940123 I can see that when a user clicks a tab a request is sent similar to the above code; within developer tools I can see several cookies associated with this query:
Request cookies:
BNI_persistence = “fQZHNVs5szWnatWt8bmuAJzA1b_wBc4”
user = “apparent per session cookie”
Response cookies:
BNI_persistence = “fQZHNVs5szWnatWt8bmuAJzA1b_wBc4”
I’ve tried adding these to my POST() request in various ways, but I still can’t get past the user error. Is this a cookie issue, or some other unknown way R or HTTR is storing data that I don’t understand?