I am currently working on a webscraper, and for the most part it works quite well. I have been using beautiful soup to extract html content; to extract javascript content, I just started with html_requests.
Unfortunately, I am running into some issues when extracting javascript data from the following website “https://goglobal.com/”, specifically, where they have the section that includes “100+ countries”, “2500+ employees”, and “3 Billion dollars saved…”. The code does not extract the values correctly. However, the code seems to be working fine for other websites which have dynamic content being loaded.
In an attempt to isolate the issue, I wrote the following script. But, the values from the goglobal website are still displayed incorrectly.
from requests_html import HTMLSession
import time
session = HTMLSession()
url = "https://goglobal.com/"
r = session.get(url)
r.html.render(wait=10)
time.sleep(10)
print(r.html.html)
For reference I searched through the displayed output by searching for “counter-number”.
My questions are as follows:
- Why is this content not being loaded correctly?
- Is there a way to solve it while still using html_requets?
- Can I solve this using selenium or playwright/scrapy?
Thanks in advance!
I attempted to identify and resolve the issue with the script above.