Several months ago, I learned from Hussein Nasser’s YouTube video about ULIDs.
They were better than UUIDs V4 because they were lexically ordered and provided good randomness for preventing collisions. This meant that DB read speeds could be improved as data wouldn’t be too fragmented.
But today, I was implementing temporal.io ‘s TypeScript SDK example and stumbled upon cuid
.
Upon more research, I learned about cuid2.
The author makes some good points (against other ID generators):
Leaks information: Database auto-increment, all UUIDs (except V4 and including V6 – V8), Ulid, Snowflake, ShardingId, pushId, ObjectId, KSUID
Collision Prone: Database auto-increment, v4 UUID
Not cryptographically secure random output: Database auto-increment, UUID v1, UUID v4
Requires distributed coordination: Snowflake, ShardingID, database increment
Not URL or name friendly: UUID (too long, dashes), Ulid (too long), UUID v7 (too long) – anything else that supports special characters like dashes, spaces, underscores, #$%^&, etc.
Too fast: UUID v1, UUID v4, NanoId, Ulid, Xid
And goes to defend against K-sorted IDs:
TL;DR: Stop worrying about K-Sortable ids. They’re not a big deal anymore. Use createdAt fields instead.
The performance impact of using sequential keys in modern systems is
often exaggerated. If your database is too small to use cloud-native
solutions, it’s also too small to worry about the performance impact
of sequential vs random ids unless you’re living in the distant past
(i.e. you’re using hardware from 2010). If it’s large enough to worry,
random ids may still be faster.In the past, sequential keys could potentially have a significant
impact on performance, but this is no longer the case in modern
systems.One reason for using sequential keys is to avoid id fragmentation,
which can require a large amount of disk space for databases with
billions of records. However, at such a large scale, modern systems
often use cloud-native databases that are designed to handle terabytes
of data efficiently and at a low cost. Additionally, the entire
database may be stored in memory, providing fast random-access lookup
performance. Therefore, the impact of fragmented keys on performance
is minimal.Worse, K-Sortable ids are not always a good thing for performance
anyway, because they can cause hotspots in the database. If you have a
system that generates a large number of ids in a short period of time,
the ids will be generated in a sequential order, causing the tree to
become unbalanced, which will lead to frequent rebalancing. This can
cause a significant performance impact.So what kinds of operations suffer from a non-sequential id? Paged,
sorted operations. Stuff like “fetch me 100000 records, sorted by id”.
That would be noticeably impacted, but how often do you need to sort
by id if your id is opaque? I have never needed to. Modern cloud
databases allow you to create indexes on createdAt fields which
perform extremely well.The worst part of K-Sortable ids is their impact on security.
K-Sortable = insecure.
I’m currently building an enterprise non-financial application:
- It will be hosted on serverless environments (Cloudflare Workers and Google Cloud Run)
- I will be using PostgreSQL through Supabase.
- The schema will be de-normalized. I will have data in JSON fields.
- I do expect to have pagination, but they shouldn’t go past the 100 entries. (Except in edge cases, which can go to thousands of entries).
- I do expect thousands of temporal workflows running simultaneously (They need to have a unique ID)
- I plan on pen-testing and auditing the application’s security later.
Considering this is a low-hanging fruit effort, I wanted to check if changing to cuid2 makes sense. I haven’t found any post on the Internet besides the paralleldrive’s GitHub, and I wanted to get a second opinion.
9