This is a question of model design, and hopefully some architecture nerds have strong opinions on this. I’ve been developing one that I will post in an answer.
Typically, in every codebase I’ve ever seen, if your model has an integer id, it is typed as an int
and if it has a guid id it is a Guid
. Is this really correct? Is this leaky abstraction? Is it a leaky abstraction worth keeping?
2
Using native types (int
, Guid
, etc.) instead of types which express the intent (Price
, ProductIdentifier
, etc.) has two drawbacks:
-
The type change is cascaded all over the code base, making it particularly painful.
For instance, switching from
smallint
(short
) touniqueidentifier
(Guid
) is not only painful at database level (how do you transform production data from auto-incremented fields to randomly generated UIDs?), but also at code level. You must go through the entire code base and replaceshort
byGuid
, eventually dealing with compile errors or even runtime errors (in a case of a dynamic language or when Reflection is used). -
The type-related methods are spread through the code base instead of being constrained within a single class. This creates a risk of code duplication.
When you have
ProductIdentifier
, you can easily locateCheckIfValueInRange
method: obviously, it is inProductIdentifier
class and only there.When you have
int
as product identifier, you may guess thatCheckIfValueInRange
is in data access layer, but then it’s up to you to locate it. Under pressure, some programmers may be reluctant spending time searching for the method, and write their own checker:if (this.Id <= 0 || this.Id > 100000) { throw new Exception("The product ID is invalid."); }
which may unfortunately differ from another implementation which is more conform to the spec:
const int MAX_ID = 100000; if (this.Id < 0 || this.Id > MAX_ID) { throw new OutsideRangeException("The identifier is outside the allowed range."); }
leading to refactoring/maintenance nightmare, which could easily be avoided if one were using
ProductId
class.
Do identifiers change their type often?
Actually, they are. A developer decides to use smallint
for a small website; the website becomes hugely popular and once the limit of 32,767 values reached, the type should be changed. Another developer uses uniqueidentifier
, and later it appears that there are no benefits for this type in the actual context, but having an auto-incremented, ordered value would be nice.
This makes identifiers a good candidate for using a custom type instead of a native one.
Do identifiers have business logic?
Aside range checking (see above), there are many cases where it makes sense to link the business logic to the identifier itself:
-
Generation of a new ID,
-
Checking of the existence of an ID,
-
Checking if the ID belongs to a specific group (such as ID 1 to 10000 is group A, and IDs higher than 10000 are group B),
-
etc.
Note that by removing the leaky abstraction of an ORM through the usage of a custom type, you’re also creating a new leaky abstraction. Imagine you want to order something by ID. If ProductId
corresponds to an auto-incremented field, it makes sense. If ProductId
corresponds to a GUID, ordering doesn’t make sense and you should be searching for a different criteria, like Product.CreatedTimeUtc
.
So lately I’ve been thinking that semantically the id is any unique identifier; the fact that it happens to be in the integer or the guid format is an implementation detail and exposing the fact in the type is a leaky abstraction.
Thinking about it, exposing it as a string or a dedicated Identifier
type has some advantages, it makes things easier to mock, repeat, or switch to ordered identifiers. Of course, the database should be whatever makes it preformant, but that is another implementation detail.
My intuition is that a dedicated Identifier
type is the way to go, and if your ORM doesn’t support that, a string seems a second best option.
2