I’ve set up a scraper for Google Jobs results that uses Scrapy + Playwright running on Heroku. I’m not doing hugely intensive scraping. I’m still on Heroku’s free plan for now (which surprises me, and the below might be a sign I need to upgrade?)
My scraper uses a headless Playwright browser and does parallelize to some degree. I get warnings that the memory usage is high pretty early on in the scrape. More recently, for some scrapes, around 10 minutes in I get these messages that the scrape is killed:
Process running mem=1058M(203.9%)
Error R15 (Memory quota vastly exceeded)
Stopping process with SIGKILL
State changed from up to complete
Process exited with status 0
How do I even begin to debug what might be hogging a lot of memory? And would upgrading to a paid Heroku plan help in this case?