I am building a crawler/scraper program for a specific Tor site. I’ve reached a point where I’m ready to start debugging with multiple instances, but there’s a problem: I cannot run multiple instances with my current setup.
I’m using the Tor Browser’s built in SOCKS5 proxy to run my curl requests. For some reason, as soon as I launch a second instance of the crawler, they start fighting for dominance (which manifests as a bunch of failed connections on both instances.) I’m not sure if this is expected behavior or if there’s something wrong on my end.
What would be the “standard” way for a PHP program to launch/manage parallel Tor connections?
(Keep in mind that a single instance of this crawler is capable of running for 24+ hours with no issues. The site in question is crawler-friendly and doesn’t seem to have any countermeasures in place. I only ever see any errors when I launch a second instance.)
What I tried:
launched a second instance of my crawler, both of which connect to the same SOCKS5 proxy on port 9150 (Tor Browser standard port)
What I expected to happen:
the second instance to start crawling at full speed without interrupting the first instance
What actually resulted:
both instances seem to fail roughly 50% of requests with proxy or other random connection errors
ilovemytinfoilhat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.