I saw many topics if it comes to getting data from a website with infinite scrolling using selenium in python, but sadly i did not find any solution for my problem and i think i’m just missing something.
I am beginner if it comes to selenium.
I try to get top 500 movie titles from a Filmweb page and the main problem is i’m getting only 25 first titles. I execute scripts in while loop, but maybe in wrong place.
I tried using below code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
options = webdriver.EdgeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
browser = webdriver.Edge(options=options)
browser.get('https://www.filmweb.pl/ranking/film')
accept_button = WebDriverWait(browser, 10).until(
EC.element_to_be_clickable((By.ID, "didomi-notice-agree-button"))
)
accept_button.click()
browser.implicitly_wait(30)
items = []
last_height = browser.execute_script("return document.body.scrollHeight")
while True:
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(20)
titles = browser.find_elements(By.CLASS_NAME, "rankingType__originalTitle")
for i, title in enumerate(titles):
movie_dict = {f"Movie Number : {i + 1}, 'Title': {title.text}"}
items.append(movie_dict)
new_height = browser.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
for movie_title in items:
print(movie_title)
browser.quit()
The result i get:
{"Movie Number : 1, 'Title': The Shawshank Redemption 1994"}
{"Movie Number : 2, 'Title': Intouchables 2011"}
{"Movie Number : 3, 'Title': The Green Mile 1999"}
{"Movie Number : 4, 'Title': The Godfather 1972"}
{"Movie Number : 5, 'Title': 12 Angry Men 1957"}
{"Movie Number : 6, 'Title': 1994"}
{"Movie Number : 7, 'Title': One Flew Over the Cuckoo's Nest 1975"}
{"Movie Number : 8, 'Title': The Godfather: Part II 1974"}
{"Movie Number : 9, 'Title': The Lord of the Rings: The Return of the King 2003"}
{"Movie Number : 10, 'Title': Schindler's List 1993"}
{"Movie Number : 11, 'Title': 1994"}
{"Movie Number : 12, 'Title': La vita è bella 1997"}
{"Movie Number : 13, 'Title': The Lord of the Rings: The Two Towers 2002"}
{"Movie Number : 14, 'Title': Se7en 1995"}
{"Movie Number : 15, 'Title': Fight Club 1999"}
{"Movie Number : 16, 'Title': Goodfellas 1990"}
{"Movie Number : 17, 'Title': The Pianist 2002"}
{"Movie Number : 18, 'Title': 2019"}
{"Movie Number : 19, 'Title': Django Unchained 2012"}
{"Movie Number : 20, 'Title': A Beautiful Mind 2001"}
{"Movie Number : 21, 'Title': Inception 2010"}
{"Movie Number : 22, 'Title': The Silence of the Lambs 1991"}
{"Movie Number : 23, 'Title': The Lion King 1994"}
{"Movie Number : 24, 'Title': Scarface 1983"}
{"Movie Number : 25, 'Title': 2008"}
Some movie titles have only year, but thats because their original title is in another place in source code structure and i will deal with it later.
So firstly i wanna somehow extract 500 titles, then i will work with another data after i know how to deal with the current problem.
Maybe someone here had such problem and can help me.
Thanks in advance