Relative Content

Tag Archive for pythonselenium-webdriverweb-scraping

Web scrapping with proxy in 2024

I am newbie in web scrapping, but I’m trying to learn it. My main goal is to create bot with proxy rotating support. My environment: macOS, python 3.12, paid proxy tool with credentials (spoiler: this is the biggest problem). So, I did and researched the following.

python selenium dom elements are not found

I am trying to automate reading data from a website with differently named tags Like <c-practitioner-search ... >. I tried to see them as normal tags and used find_elements() but there was no success in it.

Ciclying pages of a website with selenium

I’m trying to scrape this page: https://www.lavoro.gov.it/Pagine/Cerca-nel-sito?search=big+data
As you can see at the bottom of the page there are the number of pages and the icon with the arrow that i’m clicking with selenium.

How Can I Improve the Efficiency of Scraping a Dynamic Table with Selenium to Reduce Time Taken?

I am currently trying to scrape data from a website CCMT 2021 OR CR which has a dynamic structure. The table is paginated, and to navigate to other pages, there is an option of ‘Next’ or clicking on ‘2’, ‘3’, etc. I want to extract all the data and save it to an Excel file.
Previously, I extracted data from a similar link,CCMT 2023 OR CR, using the same approach. Scraping one page (21 rows) took approximately 23 seconds. With 383 pages, the total time was around 2 hours. The 2021 data set is similar in size.