I want to scrap a website with Python (I am kinda new to it).
My starting page is a page with information about a convention exposant, I need to get the informations of the exposant and pass to the next one (there is a next button on the website).
I succeed to enter the website with selenium (because I needed to authenticate) and to scrap the informations of the next page. I succeed to pass to the next page with the next button.
But when I pass to the next page and try to get the informations of the 2nd exposant, it gets the same ones of the first. The html code source doesn’t change. Moreover, I have an error when trying to click on the next button a second time (to go to the third exposant).
Here my code:
To authenticate (This part works)
driver = webdriver.Chrome()
driver.get('https://*******')
time.sleep(0.5)
first_input = driver.find_element(By.NAME, 'badgeNumber')
first_input.send_keys('*****')
time.sleep(0.5)
first_submit_button = driver.find_element(By.CSS_SELECTOR, 'div.button__text')
first_submit_button.click()
time.sleep(0.5)
second_input = driver.find_element(By.CSS_SELECTOR, 'input.initial-input[placeholder="Initiales"]')
second_input.send_keys('..')
time.sleep(0.5)
second_submit_button = driver.find_element(By.CSS_SELECTOR, 'button.btn span')
second_submit_button.click()
time.sleep(0.5)
element = driver.find_element(By.CSS_SELECTOR, 'span.alert-button-inner')
element.click()
time.sleep(0.5)
To get the informations
def get_exposant_details():
exposant_data = {}
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
exposant_data['Nom'] = driver.find_element(By.CSS_SELECTOR, "div.imgBloc > span.noLogo").text
exposant_data['Description'] = driver.find_element(By.CSS_SELECTOR, "div.presentation.mb-10").text
exposant_data["Pays"] = soup.find_all('div', class_='country ng-star-inserted')[0].text
exposant_data["Email"] = driver.find_element(By.CSS_SELECTOR, "a[href^='mailto']").get_attribute("href").replace("mailto:", "")
categories = []
elements = soup.find_all('div', class_='categories ng-star-inserted')
for i in elements:
categories.append(i.text)
exposant_data["Catégories"] = categories
return exposant_data
To iterate
exposants = []
for i in range(2):
exposant_data = get_exposant_details()
print(exposant_data)
exposants.append(exposant_data)
next_button = driver.find_element(By.CSS_SELECTOR, 'ion-icon[name="chevron-forward"]')
next_button.click()
time.sleep(30)
And so the problem is first that I get the same exposant details while I see my page changing.
And I get an error too:
Traceback (most recent call last):
File "C:UserseeberleIdeaProjectstestextract.py", line 56, in <module>
next_button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'ion-icon[name="chevron-forward"]')))
File "C:UserseeberleIdeaProjectstestvenvlibsite-packagesseleniumwebdriversupportwait.py", line 96, in until
value = method(self._driver)
File "C:UserseeberleIdeaProjectstestvenvlibsite-packagesseleniumwebdriversupportexpected_conditions.py", line 363, in _predicate
target = driver.find_element(*target) # grab element at locator
File "C:UserseeberleIdeaProjectstestvenvlibsite-packagesseleniumwebdriverremotewebdriver.py", line 748, in find_element
return self.execute(Command.FIND_ELEMENT, {"using": by, "value": value})["value"]
File "C:UserseeberleIdeaProjectstestvenvlibsite-packagesseleniumwebdriverremotewebdriver.py", line 354, in execute
self.error_handler.check_response(response)
File "C:UserseeberleIdeaProjectstestvenvlibsite-packagesseleniumwebdriverremoteerrorhandler.py", line 229, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchWindowException: Message: no such window: target window already closed
I tried to use wait.until things..doesn’t work
Did someone experienced something similar?