I’m building a scalable FastAPI application with the goal of handling more than 10,000 requests per second (RPS). My application is quite complex. However, I’ve identified a significant bottleneck in scaling: each individual insert operation takes about 1 to 2 milliseconds of CPU time.
This CPU time means that a single worker is heavily capped, limiting my full stack application to around 300 RPS for endpoints involving multiple inserts with one pod / worker. As a result, I need to scale up a lot of workers to achieve higher RPS, which seems inefficient.
Again the problem here is not so much the time it takes on the db side, I am not too concerned about latency in my use case, but the cpu time limiting scaling.
I created a simplified reproducible example. It shows that a single insert takes around 1 to 2 milliseconds. I’m trying to understand if this performance is normal for FastAPI, or if there’s something wrong with my setup. I have seen benchmarks showing better performance, so I’m wondering if there’s any optimization potential.
Here’s the simplified code for the insert operation:
app.py
import time
from contextlib import asynccontextmanager
from fastapi import FastAPI
from sqlalchemy import AsyncAdaptedQueuePool, text
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import sessionmaker
db_username = "****"
db_password = "****"
host = "****"
async def setup_database():
dsn = f"postgresql+asyncpg://{db_username}:{db_password}@{host}:5432/postgres"
engine = create_async_engine(
dsn,
pool_size=20,
poolclass=AsyncAdaptedQueuePool,
)
return sessionmaker(engine, class_=AsyncSession)
@asynccontextmanager
async def lifespan(app: FastAPI):
# Setup the database connection pool
app.state.db_session = await setup_database()
yield
# Close the database connection pool
await app.state.db_session.close()
app = FastAPI(lifespan=lifespan)
@app.get("/test-insert")
async def test_insert():
start_time = time.time()
start_cpu_time = time.process_time()
insert_query = text("INSERT INTO simple_text (text) VALUES (:text) RETURNING id")
params = {"text": "Test"}
async with app.state.db_session.begin() as sess:
result = await sess.execute(insert_query, params)
text_id = result.scalar_one()
await sess.commit()
end_time = time.time()
end_cpu_time = time.process_time()
duration = end_time - start_time
cpu_duration = end_cpu_time - start_cpu_time
print(
f"Test Insert: {duration:.6f} seconds, CPU time: {cpu_duration:.6f} CPU seconds"
)
return {"id": text_id, "duration": duration, "cpu_duration": cpu_duration}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Dockerfile
# Use an official Python 3.12 runtime as a parent image
FROM python:3.12-slim
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Expose port 8000 to the outside world
EXPOSE 8000
# Command to run the FastAPI application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
requirements.txt
fastapi
uvicorn[standard]
sqlalchemy
asyncpg
For context I set up a postgresql db with a table simple_text being made of just a primary key id and a character column text.
I am using a macbook pro M3 with colima and rosetta (but this perf problem seems to be happening on my gcp k8 set up that uses linux pods).