I am in the process of creating a programme that crawls and scrapes a particular Tor site. I have reached the stage where I am ready to begin debugging with multiple instances; however, I have encountered an issue. Specifically, I am unable to execute several instances simultaneously in my current setup.
I am employing the Tor Browser’s integrated SOCKS5 proxy to run my cURL requests. It appears that, upon launching a second crawler instance, they begin competing for dominance (which materializes as numerous failed connections on both instances.) I am uncertain whether this is anticipated behaviour or if there is an issue on my end.
What would be the customary technique for a PHP programme to initiate and administer concurrent Tor connections?
(Take into account that a single instance of this crawler can operate for over 24 hours without encountering any problems. The website in question is crawler-friendly and does not appear to have any countermeasures in place. I only ever encounter errors when I launch a second instance.)
What I tried:
initiated a second instance of my crawler, connecting to the same SOCKS5 proxy on port 9150 (the default Tor Browser port)
What I expected to happen:
the second instance should commence crawling at full speed without disrupting the first instance
What actually resulted:
both instances appear to fail approximately 50% of requests, yielding proxy or other random connection errors
ilovemytinfoilhat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.