The following code works fine when I have, e.g, 10 tasks:
tasks = [asyncio.create_task(do_work(file)) for file in files]
await asyncio.gather(*tasks)
Each method call is about 10 seconds of work (mostly waiting on an API to return).
But, when I have ~ 50K files, the tasks take forever to finish. I’m guessing this is because all the tasks are competing for time on the same thread? Not sure.
I tried batching it, like this:
async def main():
tasks = [asyncio.create_task(do_work(file)) for file in files]
print("finished creating tasks ...")
# await asyncio.gather(*tasks)
batch_size = 2
for i in range(0, len(tasks), batch_size):
await asyncio.gather(*tasks[i:i+batch_size])
But that didn’t help. Which, I guess makes sense, since asyncio starts running the tasks as soon as I call create_task
.
What can I do to get maximal performance in processing these files? It’s worth noting, that I have a ton of cores. So ideally, each core would have its own asyncio event loop and process work, or something like that.