So yeah as the title says, logs on Vercel are overflowing with errors (“Task timed out after 15.02 seconds”). About 2k per hour.
To give you an understanding of the architecture, I host a nextjs 14
app on Vercel (pro plan). Serverless functions execute SSR code (lambda functions) which are tasked with fetching data from the database (RDS postgres with PGBouncer hosted on AWSs ECS) and rendering components, also Prisma ORM
handles the connections inside the app.
My best guess is that the lambda functions running inside of nextjs
(that each opens 7 connections to the db) are using up all the allowed connections. In cloudwatch I can see a lot of these closing because: client unexpected eof (age=XXX)
errors. (where XXX = between 300s and 7200s). The number of these erros in cloudwatch is almost equal to the corresponding number of “task timeout errors” from Vercel, 2k/hour
Here are my pgbouncer
values:
- pg_bouncer_default_pool_size: 20
- pg_bouncer_idle_transaction_timeout: 300
- pg_bouncer_max_client_conn: 1500
- pg_bouncer_max_connections: 1000
- pg_bouncer_max_instances: 3
- pg_bouncer_min_pool_size: 5
- pg_bouncer_reserve_pool_size: 60
Things i’ve tried:
- Scaled the allocation of CPU/MEM for the ECS (from 0.5cpu -> 1cpu, 512MB -> 1GB) basically increased the instance from
db.t4g.micro
todb.t4g.small
- added the
pg_bouncer_idle_transaction_timeout
[somehow I think this only made it worse] - Increased the timeout value for Vercel serverless functions to
15s
- made sure pgbouncer is in
transaction mode
and that I’ve appended&pgbouncer=true
to the prisma connection string - Increased the Vercel function cpu from Basic to Performance
- Made sure
prisma client
is being instantiated outside all functions and used globaly accross the entire app as per their docs
I’m only a FE engineer so all this aws/db stuff is a bit over my head, any help would be appriciated.