I’m trying to collect links to personal profiles and contacts from the following website:
https://www.dlapiper.com/en-us/people#t=All&sort=relevancy&numberOfResults=100&f:CountriesID=[United%20Kingdom]
I’m using Selenium to do scraping via chromedriver and normally it works just fine – however, for this particular website I can’t get to the source html where all the links to people’s profiles would be visible.
I wrote a standard script which would normally work for any other dynamic website.
links = []
driver = webdriver.Chrome()
driver.get('https://www.dlapiper.com/en-gb/people#t=All&sort=%40lastname%20ascending&f:CountriesID=[United%20Kingdom]')
time.sleep(5)
cookies_button = driver.find_element(By.ID, "onetrust-reject-all-handler")
cookies_button.click()
time.sleep(5)
html = driver.page_source
time.sleep(5)
soup = BeautifulSoup(html, 'html.parser')
parse = soup.find_all('a')
for item in parse:
links.append(item.get('href'))
print(links)
However, links from the people search block can’t get into the driver.page_source – even though I can find all the link elements when I press “inspect” in Chrome. I have tried increasing the time.sleep(), did not help.
I understand that there are lots of javascripts being executed on this page – maybe I need to activate some of them manually? Help would be much appreciated as I don’t know Javascript.
Илья Хоанг is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.