Thiết kế website giá rẻ

Question

I’m trying to use Scrapy with Playwright on WSL Ubuntu to scrape a website. However, I’m encountering a TimeoutError when the Page.wait_for_selector method is executed. The error message indicates that the timeout limit of 30000ms has been exceeded while waiting for the selector ‘li.pan5o1’ to be visible. How can I resolve this issue?

import scrapy
from scrapy_playwright.page import PageMethod
from CrowdA.items import CrowdAItem

def should_abort_request(request):
    if request.resource_type == "image":
        return True
    if request.method.lower() == 'post':
        return True
    return False

class CrowdASpider(scrapy.Spider):
    name = 'crowdAP'
    
    custom_settings = {
        'PLAYWRIGHT_ABORT_REQUEST' : should_abort_request
    }

    def start_requests(self):
        # URL to start scraping
        url = "https://www.grainger.com/category/hardware/braces-and-brackets?categoryIndex=1"
        
        # Set up request with meta data including Playwright configurations
        request = scrapy.Request(url, meta={
            'playwright': True,
            'playwright_include_page': True,
            'playwright_page_methods': [
                PageMethod("wait_for_selector", "li.pan5o1"), 
                PageMethod("evaluate", "window.scrollBy(0, document.body.scrollHeight)"),
            ],
            'errback': self.errback,
        })
        yield request

    async def parse(self, response):
        page = response.meta.get("playwright_page")
        if not page:
            self.logger.error("Playwright page is None")
        else:
            await page.close()

    async def errback(self, failure):
        page = failure.request.meta.get("playwright_page")
        if page:
            await page.close()

**Terminal Error :**
Traceback (most recent call last): 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/twisted/internet/defer.py", line 1999, in _inlineCallbacks result = context.run( 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/twisted/python/failure.py", line 519, in throwExceptionIntoGenerator return g.throw(self.value.with_traceback(self.tb)) 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_request return (yield download_func(request=request, spider=spider)) 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/twisted/internet/defer.py", line 1251, in adapt extracted: _SelfResultT | Failure = result.result() 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/scrapy_playwright/handler.py", line 340, in _download_request return await self._download_request_with_page(request, page, spider) 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/scrapy_playwright/handler.py", line 388, in _download_request_with_page await self._apply_page_methods(page, request, spider) 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/scrapy_playwright/handler.py", line 499, in _apply_page_methods pm.result = await _maybe_await(method(*pm.args, **pm.kwargs)) 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/scrapy_playwright/_utils.py", line 16, in _maybe_await return await obj 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/playwright/async_api/_generated.py", line 7812, in wait_for_selector await self._impl_obj.wait_for_selector( 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/playwright/_impl/_page.py", line 373, in wait_for_selector return await self._main_frame.wait_for_selector(**locals_to_params(locals())) 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 323, in wait_for_selector await self._channel.send("waitForSelector", locals_to_params(locals())) 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 59, in send return await self._connection.wrap_api_call( 
File "/home/harry/Interview_Assignment/venv/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 513, in wrap_api_call raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None playwright._impl._errors.TimeoutError: Page.wait_for_selector: Timeout 30000ms exceeded. Call log: waiting for locator("li.pan5o1") to be visible

I attempted to run a Scrapy spider named CrowdASpider, which utilizes Playwright for web scraping. The spider is configured to wait for a specific selector (“li.pan5o1”) to be visible on the page before proceeding with parsing. However, when executing the spider, I encountered a TimeoutError indicating that the specified selector was not visible within the timeout period of 30,000 milliseconds.

I expected the spider to successfully wait for the selector to be visible and then proceed with parsing the page content. Instead, the TimeoutError occurred, suggesting that the selector was not found within the specified timeout period.

Thiết kế website giá rẻ

Danh mục

Scrapy Playwright TimeoutError: Page.wait_for_selector: Timeout 30000ms Exceeded on WSL Ubuntu