I’m learning web scraping, and i’m trying to get data from a page that show information with scroll, What can I do in this scenario?, Is there a function to make the entire page load? I am using selenium and beautifulsoup
This is the code:
html = driver.page_source
bs = BeautifulSoup(html, 'html.parser')
games = bs.find_all('div', {'class': 'GVj7ae imso-medium-font qJnhT imso-ani'})
for game in games:
print(game.get_text())
I read about a script that can scroll the page, but doesn’t work, just give to me duplicated data, for instance, if the output is
Apertura · Jornada 1 de 17
Apertura · Jornada 2 de 17
Apertura · Jornada 3 de 17
Apertura · Jornada 4 de 17
Apertura · Jornada 5 de 17
The script for scroll given to me:
Apertura · Jornada 1 de 17
Apertura · Jornada 2 de 17
Apertura · Jornada 3 de 17
Apertura · Jornada 4 de 17
Apertura · Jornada 5 de 17
Apertura · Jornada 1 de 17
Apertura · Jornada 2 de 17
Apertura · Jornada 3 de 17
Apertura · Jornada 4 de 17
Apertura · Jornada 5 de 17
This is the script:
import time
driver = webdriver.Chrome(service=service, options=chrome_options)
driver.get(url)
scroll_pause_time = 2
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(scroll_pause_time)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
html = driver.page_source
bs = BeautifulSoup(html, 'html.parser')
games = bs.find_all('div', {'class': 'GVj7ae imso-medium-font qJnhT imso-ani'})
for game in games:
print(game.get_text())
driver.quit()
Carlos Ramirez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.