I am scraping the first few pages of a site. This has recently stopped working after the 10th page:
<code> page += 1
rankings_url = f'{URL_RANKINGS}{page}'
res = get(rankings_url)
html = BeautifulSoup(res.text, 'html.parser')
rows = html.find_all('tr', id='row_')
logger.info(f'Found {len(rows)} rows for page {page}...')
</code>
<code> page += 1
rankings_url = f'{URL_RANKINGS}{page}'
res = get(rankings_url)
html = BeautifulSoup(res.text, 'html.parser')
rows = html.find_all('tr', id='row_')
logger.info(f'Found {len(rows)} rows for page {page}...')
</code>
page += 1
rankings_url = f'{URL_RANKINGS}{page}'
res = get(rankings_url)
html = BeautifulSoup(res.text, 'html.parser')
rows = html.find_all('tr', id='row_')
logger.info(f'Found {len(rows)} rows for page {page}...')
This works for the first 10 pages. However, from page 11 and onward, there are no rows. When I look in the inspector for the request the rows are in the response, as well as when I look at the source.
I cannot figure out what the problem could be – it used to work. I’m just fetching it with requests:
<code> res = requests.get(
url, params=params, headers=headers, timeout=30, allow_redirects=redirect
)
</code>
<code> res = requests.get(
url, params=params, headers=headers, timeout=30, allow_redirects=redirect
)
</code>
res = requests.get(
url, params=params, headers=headers, timeout=30, allow_redirects=redirect
)
The headers and params would be empty, or None.
Page in question: https://boardgamegeek.com/browse/boardgame/page/11