I’m trying to do web scraping for Myntra, but when I do it using Selenium or Requests, it isn’t working in the virtual machine. Can anyone help me with this?
I tried Selenium and Requests in Python but I am unable to get the results in the virtual machine.
I got this error when I tried scrapy.Spider
:
My Code :
<code>import scrapy
class MySpider(scrapy.Spider):
name = "myntra"
start_urls = ["https://www.myntra.com/handbags/miraggio/miraggio-textured-miniature-sling-bag/27624596/buy"]
custom_settings = {
'DOWNLOAD_DELAY': 1,
'CONCURRENT_REQUESTS': 1,
'RETRY_TIMES': 3,
'RETRY_HTTP_CODES': [500, 503, 504, 400, 403, 404, 408],
'DOWNLOADER_MIDDLEWARES': {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
}
def parse(self, response):
price = response.xpath("//span[@class='pdp-price']/text()").extract()
yield {'price': price}
</code>
<code>import scrapy
class MySpider(scrapy.Spider):
name = "myntra"
start_urls = ["https://www.myntra.com/handbags/miraggio/miraggio-textured-miniature-sling-bag/27624596/buy"]
custom_settings = {
'DOWNLOAD_DELAY': 1,
'CONCURRENT_REQUESTS': 1,
'RETRY_TIMES': 3,
'RETRY_HTTP_CODES': [500, 503, 504, 400, 403, 404, 408],
'DOWNLOADER_MIDDLEWARES': {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
}
def parse(self, response):
price = response.xpath("//span[@class='pdp-price']/text()").extract()
yield {'price': price}
</code>
import scrapy
class MySpider(scrapy.Spider):
name = "myntra"
start_urls = ["https://www.myntra.com/handbags/miraggio/miraggio-textured-miniature-sling-bag/27624596/buy"]
custom_settings = {
'DOWNLOAD_DELAY': 1,
'CONCURRENT_REQUESTS': 1,
'RETRY_TIMES': 3,
'RETRY_HTTP_CODES': [500, 503, 504, 400, 403, 404, 408],
'DOWNLOADER_MIDDLEWARES': {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
}
def parse(self, response):
price = response.xpath("//span[@class='pdp-price']/text()").extract()
yield {'price': price}
Here’s my error log :
<code>24-11-21 22:35:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-11-21 22:35:14 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.myntra.com/handbags/miraggio/miraggio-textured-miniature-sling-bag/27624596/buy> (failed 1 times): User timeout caused connection failure: Getting https://www.myntra.com/handbags/miraggio/miraggio-textured-miniature-sling-bag/27624596/buy took longer than 180.0 seconds.
2024-11-21 22:35:14 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36
</code>
<code>24-11-21 22:35:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-11-21 22:35:14 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.myntra.com/handbags/miraggio/miraggio-textured-miniature-sling-bag/27624596/buy> (failed 1 times): User timeout caused connection failure: Getting https://www.myntra.com/handbags/miraggio/miraggio-textured-miniature-sling-bag/27624596/buy took longer than 180.0 seconds.
2024-11-21 22:35:14 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36
</code>
24-11-21 22:35:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-11-21 22:35:14 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.myntra.com/handbags/miraggio/miraggio-textured-miniature-sling-bag/27624596/buy> (failed 1 times): User timeout caused connection failure: Getting https://www.myntra.com/handbags/miraggio/miraggio-textured-miniature-sling-bag/27624596/buy took longer than 180.0 seconds.
2024-11-21 22:35:14 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36