So I am trying to understand the parallelism in asyncio.gather() function.
I have a FastAPI app that needs to upload 4 files to s3 at the same time.
I am using a local S3 for testing right now, so latency should be minimal, but I see different result numbers that confuse me a bit.
I ran each code sample 5 times and will past here the fastest execution of it.
Asyncio Gather with Partial:
loop = asyncio.get_running_loop()
start = time.time() * 1000
await asyncio.gather(
*[
loop.run_in_executor(
None,
functools.partial(
s3_client.upload_fileobj, file.file, "test", file.filename
),
)
for file in files
]
)
end = time.time() * 1000
print(f"gather time: {end - start}")
Executions:
- 125 ms
- 130 ms
- 119 ms
- 122 ms
- 132 ms
Asyncio Gather with tasks and separate async func:
async def s3_upload_fileobj_async(*args, **kwargs):
return s3_client.upload_fileobj(*args, **kwargs)
start = time.time() * 1000
await asyncio.gather(
*[
asyncio.create_task(
s3_upload_fileobj_async(file.file, "test", file.filename)
)
for file in files
]
)
end = time.time() * 1000
print(f"gather time: {end - start}")
Executions:
- 194 ms
- 163 ms
- 152 ms
- 168 ms
- 164 ms
Normal sync function:
start_total = time.time() * 1000
for file in files:
start = time.time() * 1000
s3_client.upload_fileobj(file.file, "test", file.filename)
end = time.time() * 1000
print(f"File: {file.filename} time: {end - start}")
end_total = time.time() * 1000
print(f"Total time: {end_total - start_total}")
Executions:
Separate file execution time run 1:
File: 1 time: 32.92822265625
File: 2 time: 41.922119140625
File: 3 time: 39.576904296875
File: 4 time: 49.405029296875
- Total: 163ms
- Total: 222ms
- Total: 162ms
- Total: 182ms
- Total: 156ms
Observations:
- Asyncio Gather with Tasks is roughly the same as fast as Sync code.
- Asyncio Gather with partials is faster than all approaches.
Questions:
Since the per file time to upload is from 32ms to 49ms, if Asyncio gather starts all the executions at the same time, the expected time to finish would be the longest file upload time, right?
49ms(longet file upload) vs 163ms (Asyncio gather with Tasks)
49ms(longet file upload) vs 125ms (Asyncio gather with partials)
- How come Asyncio gather with Tasks is almost the same performance as Sync code where we wait for each file to upload separately and continue with the next? Are tasks using asyncio gather actually not run in parallel all at the same time?
Thanks