In short, I have to test the behaviours of a waiting room guarding product, which is appended to the website’s html
as javascript
, with a selenium
scrapper.
I made it multi-threaded
, as the waiting room script seems to detect the cookies and browser information for identification. Thus I thought of starting multiple new drivers at once and accessing the home page. This is a sample version of what I wrote:
class ScrappingThread(threading.Thread):
# ...init function
def run(self):
is_redirected = False
redirected_times = 0
while (not is_redirected) or (isRedirectContinued and
redirected_times < (int(os.getenv("REQUIRED_REDIRECTED_TIMES") or 3))):
driver = webdriver.Edge(options=options)
if bool(os.getenv("IS_MONITORED_SECOND_WINDOW") or False):
driver.set_window_position(0, -600)
driver.get(os.getenv("WEBSITE_URL"))
driver.implicitly_wait(2)
target_url = driver.current_url
print(target_url)
sleep(int(os.getenv("IDLE_TIME_FOR_EACH_ACCESS") or 60))
if initUrl != target_url:
is_redirected = True
redirected_times += 1
if redirected_times >= 4:
sleep(int(os.getenv("IDLE_TIME_AFTER_EACH_REDIRECTION") or 5))
print(
f"The redirect process is completed for {redirected_times} time(s) - Thread {self.index} ({redirected_times >= 4})")
sleep(int(os.getenv("IDLE_TIME_AFTER_EACH_REDIRECTION") or 10))
threads = []
for index in range(int(os.getenv("THREAD_NUM") or 2)):
t = ScrappingThread(os.getenv("WEBSITE_URL"), index)
t.start()
threads.append(t)
# Wait before a new thread starts so browsers are not opened concurrently.
sleep(int(os.getenv("IDLE_TIME_BEFORE_STARTING_NEW_THREAD") or 2))
for t in threads:
t.join()
However, the program gets greatly slowed down when around 20
threads are concurrently running. And it would be great to know if there are any ways to speed it up with selenium
, or to convert to other libraries for this task.
UPDATE: I am currently using an HP computer with i7-1165G7 @ 2.80Ghz
and 16.0GB
RAM.
2