I’m trying to scrape URLs from a dynamically allocated webpage that requires continuous scrolling to load all the content into the DOM. My approach involves running window.scrollTo(0, document.body.scrollHeight);
in a loop using Selenium’s execute_script
function. After each scroll, I compare the number of URLs loaded before and after the scroll. If the number of URLs doesn’t change, I assume the end of the page has been reached and break the loop.
However, the script assumes that all content has been loaded into the DOM, even though I know new content is being loaded within the given timeout
. Below is my code:
def _scroll_page_to_bottom(self, timeout: int): # Todo: Fix Bugs
while True:
urls_before_scroll = self.browser.find_elements(
By.XPATH, read_xpath(self.scrape_programs_urls.__name__, "programs_urls")
)
self.browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait for new content to be loaded
try:
WebDriverWait(self.browser, timeout).until(
lambda _: len(self.browser.find_elements(
By.XPATH, read_xpath(self.scrape_programs_urls.__name__, "programs_urls"))
) > len(urls_before_scroll)
)
except TimeoutException:
# If no new content is loaded within the timeout, assume we've reached the end of the page
break
Can anyone please guess what could be causing the issue in the above code?
binary is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.