Please consider this class:
class ClassA{
private Thing[] things; // stores data
// stuff omitted
public Thing[] getThings(){
return things;
}
}
This class exposes the array it uses to store data, to any client code interested.
I did this in an app I’m working on. I had a ChordProgression
class that stores a sequence of Chord
s (and does some other things). It had a Chord[] getChords()
method that returned the array of chords. When the data structure had to change (from an array to an ArrayList), all client code broke.
This made me think – maybe the following approach is better:
class ClassA{
private Thing[] things; // stores data
// stuff omitted
public Thing[] getThing(int index){
return things[index];
}
public int getDataSize(){
return things.length;
}
public void setThing(int index, Thing thing){
things[index] = thing;
}
}
Instead of exposing the data structure itself, all of the operations offered by the data structure are now offered directly by the class enclosing it, using public methods that delegate to the data structure.
When the data structure changes, only these methods have to change – but after they do, all client code still works.
Note that collections more complex than arrays might require the enclosing class to implement even more than three methods just to access the internal data structure.
Is this approach common? What do you think of this? What downsides does it have other? Is it reasonable to have the enclosing class implement at least three public methods just to delegate to the inner data structure?
Code like:
public Thing[] getThings(){
return things;
}
Doesn’t make much sense, since your access method is doing nothing but directly returning the internal data structure. You might as well just declare Thing[] things
to be public
. The idea behind an access method is to create an interface that insulates clients from internal changes and precludes them from manipulating the actual data structure except in discreet ways as allowed by the interface. As you discovered when all your client code broke, your access method didn’t do that – it’s just wasted code. I think a lot of programmers tend to write code like that because they learned somewhere that everything needs to be encapsulated with access methods – but that’s for the reasons I explained. Doing it just to “follow form” when the access method isn’t serving any purpose is just noise.
I would definitely recommend your proposed solution, which accomplishes some of the most important goals of encapsulation: Giving clients a robust, discreet interface that insulates them from the internal implementation details of your class, and doesn’t allow them to touch the internal data structure expect in the ways that you decide are appropriate – “the law of least necessary privilege”. If you look at the big popular OOP frameworks, such as the CLR, the STL, the VCL, the pattern you’ve proposed is widespread, for exactly that reason.
Should you always do that? Not necessarily. For example, if you have helper or friend classes that are essentially a component of your main worker class and are not “front facing”, it’s not necessary – it’s an overkill that’s going to add a lot of unnecessary code. And in that case, I wouldn’t use an access method at all – it’s senseless, as explained. Just declare the data structure in a way that’s scoped only to the main class that uses it – most languages support ways of doing that – friend
, or declaring it in the same file as the main worker class, etc.
The only downside I can see in your proposal it that it’s more work to code (and now you’re going to have to re-code your consumer classes – but you have/had to do that anyway.) But that’s not really a downside – you need to do it right, and sometimes that takes more work.
One of the things that makes a good programmer good is that they know when the extra work is worth it, and when it isn’t. In the long run putting in the extra now will pay off with big dividends in the future – if not on this project, then on others. Learn to code the right way and use your head about it, not just robotically follow prescribed forms.
Note that collections more complex than arrays might require the
enclosing class to implement even more than three methods just to
access the internal data structure.
If you’re exposing an entire data structure through a containing class, IMO you need to think about why that class is encapsulated at all, if it’s not simply to provide a safer interface – a “wrapper class”. You’re saying the containing class does not exist for that purpose – so perhaps there’s something not right about your design. Consider breaking up your classes into more discreet modules and layering them.
A class should have one clear and discreet purpose, and provide an interface to support that functionality – no more. You may be trying to bundle things together that don’t belong together. When you do that, things will be breaking every time you have to implement a change. The smaller and more discreet your classes are, the easier it is to change things around: Think LEGO.
13
You asked: Should I always encapsulate an internal data structure entirely?
Brief Answer: Yes, most of the time but not always.
Long Answer: I think that classes follow into following categories:
-
Classes that encapsulate simple data. Example: 2D point. It’s easy to create public functions that provide the ability to get/set the X and Y coordinates but you can hide the internal data easily without too much trouble. For such classes, exposing the internal data structure details is uncalled for.
-
Container classes that encapsulate collections. STL has the classic container classes. I consider
std::string
andstd::wstring
among those too. They provide a rich interface to deal with the abstractions butstd::vector
,std::string
, andstd::wstring
also provide the ability to get access to the raw data. I wouldn’t be hasty to call them poorly designed classes. I don’t know the justification for these classes exposing their raw data. However, I have, in my work, found it necessary to expose the raw data when dealing with millions of mesh nodes and data on those mesh nodes for performance reasons.The important thing about exposing the internal structure of a class is that you have to think long and hard before giving it a green signal. If the interface is internal to a project, it will be expensive to change it in the future but not impossible. If the interface is external to the project (such as when you are developing a library that will be used by other application developers), it may be impossible to change the interface without losing your clients.
-
Classes that are mostly functional in nature. Examples:
std::istream
,std::ostream
, iterators of the STL containers. It’s outright foolish to expose the internal details of these classes. -
Hybrid classes. These are classes that encapsulate some data structure but also provide algorithmic functionality. Personally, I think these are a result of poorly thought design. However, if you find them, you have to decide whether it makes sense to expose their internal data on a case by case basis.
In conclusion: The only time I have found it necessary to expose the internal data structure of a class is when it became a performance bottleneck.
1
Instead of returning the raw data directly, try something like this
class ClassA {
private Things[] things;
...
public Things[] asArray() { return things; }
public List<Thing> asList() { ... }
...
}
So, you are essentially providing a custom collection that presents whatever face to the world is desired. In your new implementation then,
class ClassA {
private List<Thing> things;
...
public Things[] asArray() { return things.asArray(); }
public List<Thing> asList() { return things; }
...
}
Now you have the proper encapsulation, hide the implementation details, and provide backwards compatibility (at a cost).
7
You should use interfaces for those things. Wouldn’t help in your case, since Java’s array don’t implement those interfaces, but you should do it from now on:
class ClassA{
public ClassA(){
things = new ArrayList<Thing>();
}
private List<Thing> things; // stores data
// stuff omitted
public List<Thing> getThings(){
return things;
}
}
That way you can change ArrayList
to LinkedList
or anything else, and you won’t break any code since all Java collections(beside arrays) that have (pseudo?) random access will probably implement List
.
You can also use Collection
, that offer less methods than List
but can support collections without random access, or Iterable
that can even support streams but doesn’t offer much in term of access methods…
7
This is pretty common to hide your internal data structure from outside world. Sometime it is overkill specially in DTO. I recommend this for domain model. If at all its required to expose, return the immutable copy. Along with this i suggest creating an interface having these methods like get , set , remove etc.
1