I want to scrape a facebood ads website using requests library and selenium with chromedriver ( because I need to run it on pythonanywhere which has chromedriver )
so I do:
chrome_options = ChromeOptions()
arguments = [
"--disable-notifications",
"--start-maximized",
"disable-infobars",
"--disable-gpu",
"--headless",
"window-size=1980,1080",
"--allow-running-insecure-content",
"--disable-extensions",
"--no-sandbox",
"--ignore-certificate-errors",
"--test-type",
"--disable-web-security",
"--safebrowsing-disable-download-protection"
]
for argument in arguments:
chrome_options.add_argument(argument)
prefs = {
"intl.accept_languages": "en-US"
}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")
# Path to your ChromeDriver
chrome_driver_path = "/usr/local/bin/chromedriver" # This is typically the path on PythonAnywhere
# Set up the WebDriver
service = ChromeService(executable_path=chrome_driver_path)
driver = webdriver.Chrome(service=service, options=chrome_options)
setup driver and then scrape the page:
def scrape_page(companyId, companyName):
# Navigate to the Facebook Ads Library page
url = f'https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=NL&view_all_page_id={companyId}&search_type=page&media_type=all'
driver.get(url)
time.sleep(5)
print(driver.page_source)
of course sleep is not good for long term usage I should use WebDriverWait but for now I want to make it working.
but what it prints is html with script tags.. looks like page was not loaded properly. When I remove headless I see browser running and loading page properly and script prints everything loaded.
Any ideas how to fis this ?
thanks in advance!