Background
Tony Hoare’s billion dollar mistake was the invention of null
. Subsequently, a lot of code has become riddled with null pointer exceptions (segfaults) when software developers try to use (dereference) uninitialized variables.
In 1989, Wirfs-Brock and Wikerson wrote:
Direct references to variables severely limit the ability of programmers to refine existing classes. The programming conventions described here structure the use of variables to promote reusable designs. We encourage users of all object-oriented languages to follow these conventions. Additionally, we strongly urge designers of object-oriented languages to consider the effects of unrestricted variable references on reusability.
Problem
A lot of software, especially in Java, but likely in C# and C++, often uses the following pattern:
public class SomeClass {
private String someAttribute;
public SomeClass() {
this.someAttribute = "Some Value";
}
public void someMethod() {
if( this.someAttribute.equals( "Some Value" ) ) {
// do something...
}
}
public void setAttribute( String s ) {
this.someAttribute = s;
}
public String getAttribute() {
return this.someAttribute;
}
}
Sometimes a band-aid solution is used by checking for null
throughout the code base:
public void someMethod() {
assert this.someAttribute != null;
if( this.someAttribute.equals( "Some Value" ) ) {
// do something...
}
}
public void anotherMethod() {
assert this.someAttribute != null;
if( this.someAttribute.equals( "Some Default Value" ) ) {
// do something...
}
}
The band-aid does not always avoid the null pointer problem: a race condition exists. The race condition is mitigated using:
public void anotherMethod() {
String someAttribute = this.someAttribute;
assert someAttribute != null;
if( someAttribute.equals( "Some Default Value" ) ) {
// do something...
}
}
Yet that requires two statements (assignment to local copy and check for null
) every time a class-scoped variable is used to ensure it is valid.
Self-Encapsulation
Ken Auer’s Reusability Through Self-Encapsulation (Pattern Languages of Program Design, Addison Wesley, New York, pp. 505-516, 1994) advocated self-encapsulation combined with lazy initialization. The result, in Java, would resemble:
public class SomeClass {
private String someAttribute;
public SomeClass() {
setAttribute( "Some Value" );
}
public void someMethod() {
if( getAttribute().equals( "Some Value" ) ) {
// do something...
}
}
public void setAttribute( String s ) {
this.someAttribute = s;
}
public String getAttribute() {
String someAttribute = this.someAttribute;
if( someAttribute == null ) {
someAttribute = createDefaultValue();
setAttribute( someAttribute );
}
return someAttribute;
}
protected String createDefaultValue() { return "Some Default Value"; }
}
All duplicate checks for null
are superfluous: getAttribute()
ensures the value is never null
at a single location within the containing class.
Efficiency arguments should be fairly moot — modern compilers and virtual machines can inline the code when possible.
As long as variables are never referenced directly, this also allows for proper application of the Open-Closed Principle.
Question
What are the disadvantages of self-encapsulation, if any?
(Ideally, I would like to see references to studies that contrast the robustness of similarly complex systems that use and don’t use self-encapsulation, as this strikes me as a fairly straightforward testable hypothesis.)
3
The disadvantages are the inefficiency of the extra indirection, as you pointed out, and the fact that the compiler doesn’t enforce it. All it takes is your worst programmer using one unencapsulated reference to destroy the benefits.
Also, the right way to solve a null pointer problem isn’t to replace it with a non-null default value with essentially the same characteristics. The problem with null pointer dereferences isn’t that they cause a segfault. That’s just a symptom. The problem is that the programmer might not always handle an unexpected default/uninitialized value. That problem still must be handled separately with your self-encapsulation pattern.
The right way to solve a null pointer problem is to not create the object until a semantically valid non-null value can be put into the attribute, and to destroy the object before it is necessary to set any of its attributes to null. If there is never the possibility for a pointer to be null, there is never a need to check it.
Usually when people think an attribute must be null, they are trying to do too much in one class. It often makes the code much cleaner to split it into two classes. You can also split functions to avoid null assignments. Here’s an example from another question where I refactored a function to avoid a problematic null assignment.
5
There are times when it would be helpful to be able to define a data type that would hold a reference to an immutable object, but would behave as an immutable object, rather than a reference, such that code written as:
thing.foo(bar);
would compile as a call to a static method:
classOfThing.do_foo(thing, bar);
where the static method could then handle the case of the first argument being null
in whatever manner it saw fit. A lot of string-handling code in Java could have been cleaner if String
were such a type; an uninitialized variable of type string
could then behave as an empty string rather than a null reference. Conversions between such types and Object
might have been a little tricky [each such type could perhaps define a singleton object to represent the default value for instances of its type, so converting a default-valued string
to type Object
or String
would yield a reference to String.defaultInstance
, conversion of a null reference to String
would yield an NPE] but such types could have made some things much cleaner.