Im new to Selenium/automation and Im trying something like this for the first time.
The task:
I need to copy a huge list of IDs from one subdomain (sub1) of our work website one by one into individual video entries listed on another subdomain (sub2). Im not a dev and dont have any API access or anything, but I want to automate the task since its stupid and also as a form of practice.
The login on sub2 is secured by reCAPTCHA 3 which I believe denies access when trying to log in with Selenium. Doing the same steps manually goes through fine. The cookies expire upon logout and are not valid between browser instances (ie normal chrome vs incognito vs Selenium driver).
What can I use to beat it? Ive heard Puppeteer has more advanced methods of penetration.
Heres my script (jumbled together through chatGPT, SO and experimenting). Note that I had later added the google account login part thinking it mattered but it turned out not to.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
# Create Chromeoptions instance
options = webdriver.ChromeOptions()
options.add_argument(r"<User_Data>")
# Adding argument to disable the AutomationControlled flag
options.add_argument("--disable-blink-features=AutomationControlled")
# Exclude the collection of enable-automation switches
options.add_experimental_option("excludeSwitches", ["enable-automation"])
# Turn-off userAutomationExtension
options.add_experimental_option("useAutomationExtension", False)
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")
# Setting the driver path and requesting a page
driver = webdriver.Chrome(options=options)
# Changing the property of the navigator value for webdriver to undefined
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
# driver.get('https://accounts.google.com/signin')
email = "<email>"
pwd = "<password>"
# Enter email
# email_field = driver.find_element(By.XPATH, '//*[@id="identifierId"]')
# email_field.send_keys(email)
# email_field.send_keys(Keys.RETURN)
# This is where a captcha would pop up during google acc login
# driver.implicitly_wait(10) # Adjust the sleep time if necessary
# Enter password
# password_field = driver.find_element(By.XPATH, '//*[@name="password"]')
# password_field.send_keys(pwd)
# password_field.send_keys(Keys.RETURN)
# time.sleep(5)
# loading our website, link omitted on purpose
driver.get("<websiteURL>")
time.sleep(2.3)
# thought this mattered, but doesnt seem so
cookies = driver.get_cookies()
driver.add_cookie({'name': 'name', 'value': 'value'})
time.sleep(1.6)
driver.find_element(By.ID, "uc-btn-deny-banner").send_keys(Keys.ENTER)
driver.find_element(By.ID, "InputEmail").send_keys(email)
driver.find_element(By.ID, "InputPassword").send_keys(pwd)
time.sleep(4.6)
driver.find_element(By.XPATH, "//*[@id='opt-in_screens']").click()
time.sleep(1.9)
driver.find_element(By.XPATH, "//*[@id='setting']/div/div[1]/div/div/div/div[2]").click()
time.sleep(3.1)
driver.find_element(By.XPATH, "//*[@id='uc-corner-modal']/div/div[2]/div/div[3]/div[3]").click()
time.sleep(2.5)
driver.find_element(By.XPATH, "//button[@data-sitekey='6LcawHojAAAAABBp5U_Kxo04GHpsEQmaou311knu' and text()='Log in']").click()
time.sleep(100)
driver.quit()
Below is the browser response after the login attempt: