I am scraping a website. This is just for my own entertainment and education and to keep my mind sharp as I age in retirement. I am not stealing information from the website to give to others and I am certainly not going to make any money doing it.
This is an infinite scrolling website which shows elements as you scroll down. There are potentially over 100,000 elements with about 50 on the screen at once, so manually scrolling is just not feasible.
Two methods I tried were not successful in causing scrolling more than a few times:
page.evaluate(“window.scrollTo(0, document.body.scrollHeight);”)
and
page.mouse.wheel(0, 15000)
I tried a third approach which is to find an object at the bottom of the page and hover over that. It did not work as no objects at the bottom are visible.
The final solution was to choose the last visible object. That had the same behavior of causing a single scroll and stopping. I then backed up and hovered on the last item again, and that worked.
I am using async and my current page is stored as self._tab and here is the code snippet I have written. Yes, it is very similar to a similar question previously asked (Playwright auto-scroll to bottom of infinite-scroll page). Since that specific question was answered and my question is slightly different I am asking it separately:
prev_page_height = await self._tab.evaluate("document.body.scrollHeight")
while True:
movies = self._tab.locator('//picture[@class="poster__image"]')
all_movies = await movies.all()
# Go to an element near, but not at the bottom:
await all_movies[len(all_movies)-20.hover()
# Go to the last element:
await all_movies[len(all_movies)-1].hover()
await self._tab.wait_for_timeout(2000)
cur_page_height = await self._tab.evaluate(
'(document.body.scrollHeight + document.body.scrollTop)')
if cur_page_height > prev_page_height:
prev_page_height = cur_page_height
elif cur_page_height == prev_page_height:
break
My apologies at not publishing a minimal reproducible example. I will do so in the future.
James is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
3