I asked a question about whether validation behavior should be treated the same as other types of behavior in regard to the concept of OOP being “data + behavior”. I got some good answers back that addressed the question from the philosophical point of view and confirmed that the answer to that question seems to be yes.
Now I’d like to address this question from a different angle: Shouldn’t objects represent their real world counterparts?
Suppose I give a paper form to someone to fill out. They can pencil in the fields on the form in any order they’d like to and they can even fill in all sorts of incorrect values. When they turn the paper in to me I, representing a broker, may hand it back to them and tell them I can’t file it until they fix any invalid entries though.
There’s also a comparison to be made between the format of the questions on the paper and a user interface that validates input. Maybe you put checkboxes on the paper and instruct the person to only select one. Of course software can get a lot fancier with this type of validation than a physical sheet of paper, but let’s be clear that this type of validation is all in the user interface… validation in the ‘domain’ simply doesn’t exist in the physical realm for this type of object.
Is the paper going to crumple itself up or catch on fire if you write something invalid on it?
So my question is this: if a physical piece of paper has no inherit ability to validate itself, then why would this behavior belong in the domain?
9
The software doesn’t care about representing anything. Everything is ultimately memory being modified and moved around.
Representing a piece of paper in software as an object in the OO sense is because it is useful for us humans to understand both what the software will actually do, against what the software is supposed to be doing, and to be able to make sure those two are in harmony.
So, all these development methodologies and programming language designs orient around making the software understandable and manageable for humans.
Things should only map to the real world to the extent that the model is convenient. For a lot of models, having the model itself do its own validation is most convenient. For others, it’s not. But in no sensible development is that decision made on the basis of what a physical piece of paper is capable of.
2
There is some validation that can only be made with reference to the domain, though whether this is an argument that the validation belongs in the domain is moot.
The classification of errors, and hence validation, I am familiar with breaks down into:
- Input errors, which can be detected by examination of the input – so a date of 2014-02-30 is self evidently wrong.
- Context errors, which can be detected only by examination of the domain object – so a date of birth captured as 1990-01-31, and a date of education starting of 1989-03-02 is incorrect. It is this class of errors that can only be made with reference to the domain.
- Lies to the system. These can only be detected by information contained with subsequent events. So if I tell you that my date of birth is 1890-04-25 then (at worst) you may only be able to find that this is incorrect around the year 2010 when I tell you I am still alive and kicking. These errors can only be corrected by remedial actions that will fall outside the system concerned. These remedial actions may be very extensive, or actually impossible to implement.
Is the paper going to crumple itself up or catch on fire if you write something invalid on it?
No, but some time in the future, paper may be replaced with some form of electronic paper or pads, and they will complain if you make an error.
The point is, there are things that are possible in the virtual world that are not possible in the real world of atoms. Just because it doesn’t exist in the real world doesn’t mean you can’t or shouldn’t model it fictionally in the computer, if that technique suits you, your program, or your customer’s requirements.
In the case of input objects, wouldn’t it be nice if those objects were capable of telling you “hey, I am invalid” without you having to know anything about them?
Of course, if you are building something like a voting machine or a school test grader that takes in forms with pencil marks or punched holes, naturally the piece of paper isn’t going to tell you something bad happened, the machine will. But before it can, you must first model that paper input in the machine’s software, and it is that model that will tell you there is a problem, not the paper.
11
If your object is “Paper”, then to make it fully reusable you are right–it should be able to take any content without bursting into flame (Perhaps with some basic checks that always apply to paper, like the background color can’t be black? I mean like null checks that keep the paper object valid.)
However it is then refined into a “Form” (Let’s call it a Form1040 object that extends or uses your Paper object) it many not have more intrinsic checks but it will be validated by some teller standing at a window, and if you have just scribbled randomly they may just light it on fire (although it’s more likely they will just hand it back to you). This is probably best represented as a set of validations external to the “Form1040” object that are attached to or associated with it (Think a set of numbered ruless written on the back or in a booklet named “Form1040 validations”).
The nice thing about the set of validations is that you can use them to check your form before you even submit them to the teller, then the teller can use the same validations before accepting them and passing them to a data entry specialist, then the data entry process can finally apply one last check before dropping them into the database.
Trying to build the validations directly into the object is going to seriously limit reuse and maintainability. Integrating a validation mechanism specified by meta-data that works across multiple objects is a really good approach.
1
Now I’d like to address this question from a different angle: Shouldn’t objects represent their real world counterparts?
Yes. In most businesses there already exists a working system that should be translated to the software. Or to put it another way, decades of business practice has already probably figured out a good way to manage most business processes. And, often more importantly, those in the business already understand these ‘real world’ processes, so when it comes to checking if your software does what it is supposed to you can explain your software in terms of the real world process that the staff are already familiar with. Doing it differently in the software to how it is done in the real world will lead to confusion and inefficiency.
So my question is this: if a physical piece of paper has no inherit ability to validate itself, then why would this behavior belong in the domain?
Look at what happens in the real world. The validation is handled by a behaviour of the broker, who looks at the piece of paper and accepts or rejects it based on some business rules that she is familiar with.
You have a “person who validates the form” object in the real world in the shape of the role that person is playing at that moment. In some businesses that might be the broker herself, in other larger organisations you would have someone who’s only job is to do that.
Either way it is a unique unit of behaviour, and as such should be its own object that encapsulates this behaviour, independent to the form object (the paper)
1
The domain also has a concept of Domain Services. That’s still in the domain but could represent the real-world process that you describe.
A mistake that we can make is forgetting that the domain is more than just the Entity objects– some Services do belong in the domain. One way to get this wrong though is to think that it’s wrong to pass a service object into other domain objects including entities.
A straightforward example of doing this correctly is in Jimmy Boogard’s talk on Vimeo called “Crafting Wicked Domain Models”. He passes an IOfferCalculator into an Offer Entity object. That’s the right thing to do but many developers freak out when then see an object that didn’t come from the ORM passed into one that did, even when the reference is temporary.
As far as validation rules go, it really depends on the behavioral consequences of the validation and/or whether or not the same validation rules need to be applied to multiple types.
Personally I see simple validation as an infrastructure service, and try to use a library that does most of the work for things like required fields or minimum string lengths and so on.
For validation that has to do with the state of a single Entity across multiple properties where that state has specific meaning in my model, it absolutely belongs in the entity.
Whether it is with paper or highly validated electronic data gathering, discrepancies can come up between how things are thought to be done by some (management) and how it is really done in the field.
You may develop a different set of domain rules depending on how the requirements are gathered.
- Model based on the form. A supervisor hands you a blank copy of a paper form and instructs you to “make it exactly like this one.” The instructions indicate “select only one” so you code it accordingly.
- Model based on usage. The person in charge of reviewing and tallying the data gives you a stack of completed forms. You notice some dates entered as month/year because no one remembers the exact day and nor do they care. They select more than one item in the restricted area because everyone has determined that a pair of values is more accurate (How long would it take to get that feedback in a restricted system?). People make comments on the back and enter other notes in the margins.
A developer could find themselves in either scenario and would build what the client “wanted” but you may find you haven’t built what they needed. The supervisor may have recognized the inexact dates and wants to force a more precise one (That’s why we hired you.) or they see the comments and decide to add that even though it is not on the form. You’ll rarely be right the first time. Users will change the rules (And in business they’re not always logical).
You may find yourself creating a hybrid of two real world models because it is possible in the computer. The real world model tends to get so complex, no one can tell you what it is; you’re better off tackling small pieces of it at a time, being very rigid at first but understanding you may have to loosen things up and be prepared for a lot of change.
Business models (or any others involving people) and the perception/interpretation of the model and domain constantly evolve, so there probably is never a point where they really match.
Dealing with some “manage project from start to end”-type intranet applications, I believe it can be very valuable to separately-capture what different remote systems or users told my system they thought was true, versus the mix of data my system has currently decided to believe and honor.
This leads to systems where you store and act upon things like “submissions” (pending, succeeded, failed, archived) or “events” (AddressChangeRequested), rather than CRUD-ish take-it-or-leave-it modification.
While it adds a layer of abstraction, it also makes it easier to handle weird stuff the business insists on keeping, like allowing certain contradictions, or applying certain changes “retroactively” from a later point in the task lifecycle.