I am using Splash with Scrapy to load dynamically rendered content in a page, but it does not work as I expected.
In setting.py
I set these variables
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
}
SPLASH_URL="http://localhost:8050"
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
SPLASH_COOKIES_DEBUG = False
The spider
def start_requests(self):
urls = [
"https://callmeduy.com/san-pham/"
]
for url in urls:
yield SplashRequest(url=url,
# endpoint='render.html',
callback=self.parse,
args={
'wait': 5
})
def parse(self, response):
print(response.xpath("//body").get())
f = open('res.html', 'w+')
f.write(response.xpath("//body").get())
f.close()
The dynamic content has not been loaded. Here is the
response body
Pls help if anybody knows
New contributor
Minh Tuấn is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.