The same code below works for many webpages, but for a few like this thisone, it gives error:
Error: Error reading file
‘http://akademos-garden.com/homeschooling-tips-work-home-parents’:
failed to load HTTP resource
Python to reproduce:
from lxml.html import parse
import requests
page_url = 'http://akademos-garden.com/homeschooling-tips-work-home-parents/'
try:
parsed_page = parse(page_url)
dom = parsed_page.getroot()
except Exception as e:
# TODO - log this into some other error table to come back and research
errMsg = f"Error: {e} "
print(errMsg)
print("Try get without User-Agent")
result = requests.get(page_url).status_code
print("Try get with User-Agent")
result = requests.get(page_url, headers={'User-Agent': None}).status_code
This post refers to adding the User-Agent, but I don’t understand how to do that with . python lxml.html.parse not reading url.
If I have to use requests.get, I can do that, but then how do I get it in the dom object?