I am using the GBGB site to download the full greyhound results for a certain track. I am using the Python code below to get the full result for each day. The code below downloads the full-day results from Crayford for each given day (in the example code below I am downloading the results from the 25th of May to the 2nd of June. Each day where there is a meeting in Crayford has a separate meeting and raceID for example in the code below 1st June meeting = 411583 and race id = 1042517. As you can see if I want to download, for example, 3 months of data, I have to individually specify the meeting ID and race ID in the URL as a separate line which is timeconsuming.
i was wondering is there a way I can webcrape the whole site to download say 6 months’ worth of results rather than having a line-buy-line url specified. I am new to webscraping and not sure how to do this. My actual code has 9 months of data so you can imagine the size of the code as every day has to go in a separate url line.
Any help is appreciated. The site I am getting the data from is https://www.gbgb.org.uk/
import httpx
from selectolax.parser import HTMLParser
import json
from urllib.request import urlopen
import requests
from datetime import datetime, timedelta
import ndjson
URLs = [
"https://api.gbgb.org.uk/api/results/meeting/411238?raceId=1040677",#25th May Sat
"https://api.gbgb.org.uk/api/results/meeting/411239?raceId=1040615",#26th May Sun
"https://api.gbgb.org.uk/api/results/meeting/411378?raceId=1041270",#28st May Tue
"https://api.gbgb.org.uk/api/results/meeting/411461?raceId=1042006",#30rd May Thu
"https://api.gbgb.org.uk/api/results/meeting/411477?raceId=1042178",#31th May Fri
#June
"https://api.gbgb.org.uk/api/results/meeting/411583?raceId=1042517",#01st Jun Sat
"https://api.gbgb.org.uk/api/results/meeting/411452?raceId=1042214",#02nd Jun Sun
]
json_list = []
for url in URLs:
resp = requests.get(url)
page_context = resp.text
print(page_context)
json_file = json.loads(page_context)
json_list.extend(json_file)
with open('C:/mypath/Grey_results.json', 'w') as f:
ndjson.dump(json_list, f)