I am trying to parse data from a web url. However, the URL in question appears to load content dynamically on load in a browser.
I have tried a couple of things so far:
$dom = new DOMDocument;
$dom->loadHTMLFile($url);
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec($ch);
curl_close($ch);
$content = file_get_contents($url);
All of these options get me the initially loaded HTML but not the stuff I am trying to parse that is dynamically loaded afterwards.
Is there another option to parse HTML content with PHP ensuring that everything has been fully loaded in the DOM?
2
The content you want is most likely coming from one or more XMLHttpRequests (xhr). Load the page in a browser with the console open to the network tab. Filter by “xhr” or the equivalent in your browser. There you should be able to see the url endpoint(s) of the xhr requests. Clicking on the url should show you even more information. Send a new http request using curl to those endpoints, making sure to emulate the same headers, type (GET/POST/etc) and any other request parameters as shown in the network tab.
This may not get you the exact same dom elements you see in your browser, but most likely you will get the data you need to parse.