I’m working on a Python script that needs to update 10,000 records by making individual HTTP requests to a backend API. Each record has to be updated separately, and unfortunately, I don’t have the option to batch these requests on the backend.
What I’ve tried:
requests
+ threading: Used Python’sthreading
to send requests in parallel, but encountered issues with CPU usage and memory consumption.- AsyncIO with
aiohttp
: Tried usingasyncio
withaiohttp
to manage asynchronous requests. While the performance improved, I’m facing challenges with throttling, rate-limiting, and error handling in the middle of 10,000 requests.
Questions:
- Which Python library or approach would be best to efficiently handle thousands of HTTP requests with minimal overhead?
- Is
asyncio
+aiohttp
the right direction, or would a multiprocessing approach or some other combination work better for this case? - How can I handle retrying failed requests without overwhelming the server or client resources?
- Are there any best practices for managing such a high number of concurrent HTTP requests in Python without causing performance bottlenecks?
Additional Info:
- Backend API has rate-limiting, so I need to ensure that my client respects that while still processing updates as quickly as possible.
- I’ve also considered using libraries like
httpx
orgrequests
, but would appreciate advice on which direction to take.
Any insights or best practices for scaling such large HTTP operations would be helpful!
Ilya is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2