I am doing a Scrapy Spider to download images from a page, this is the Spider’s code
import scrapy
class AnilistSpider(scrapy.Spider):
name = "anilist"
allowed_domains = ["anilist.co"]
start_urls = ["https://anilist.co/search/anime?genres=Action&format=TV"]
def parse(self, response):
images_urls = response.css("div a img::attr(src)").getall()
absolute_urls = []
for img_url in images_urls:
absolute_urls.append(response.urljoin(img_url))
yield {
"image_urls": absolute_urls
}
I read the Scrapy guide and understand everything, but, when i have to save the files, it doesn’t do it, this is my settings.py file’s content:
BOT_NAME = "scrapy_test"
SPIDER_MODULES = ["scrapy_test.spiders"]
NEWSPIDER_MODULE = "scrapy_test.spiders"
ITEM_PIPELINES = {"scrapy.pipelines.images.ImagesPipeline": 1}
IMAGES_STORE = "/images"
I added de ITEM_PIPELINE and the IMAGES_STORE and when i do scrapy crawl anilist, it scraps every image, i can see them in the terminal and also open them, but, there is no folder with the images
I saw already was a ITEM_PIPELINE dict in the settings file, i tried to add the imagespipeline there but is still the same
and another thing i noticed is if i try to export the items from my project, it doesnt do it, the ide says unresolved reference, like if it doesnt exist
i read all the questions stackoverflow gave me but no one solved my problem
karensitauwu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.