Using Python and QWebEnginePage, I am trying to extract all the games from this Steam URL:
https://store.steampowered.com/search/?category2=8&ndl=1
In a browser, the page displays the first set of results, and then the next set if you scroll down to the bottom, and so on until all results are loaded. Although there are currently just under 800 results, I would assume that it can change as Steam changes things.
I can’t find a way to do this in a program.
I can get the first page, and extract it’s contents using a Java script call:
self.page.runJavaScript("document.getElementsByTagName('html')[0].innerHTML", self.js_complete)
That yields 94 results when run through my HTMLParser instance.
I then tried this JS call to scroll the page:
self.page.runJavaScript("window.scrollTo(0, document.body.scrollHeight, behaviour="smooth");", self.js_scroll_complete)
Catching the scroll complete event, and re-extracting the data (as above) still yields 94 results.
I let this infinite loop run for some time, and the number retrieved never increased.
Is there a way to do what a user would do here – i.e. keep scrolling until no more data is provided by the server?
Thanks in advance for any pointers.
Ian Pickworth is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.