This question is going to be a little long. Please bear with me.
Something that happened in a project of mine made me think about how to safely copy objects. I’ll present the situation I had and then ask a question.
There was a class SomeClass
:
class SomeClass {
private Thing[] things;
public SomeClass(Thing[] things){
this.things = things;
}
// irrelevant stuff omitted
public SomeClass copy(){
return new SomeClass(things);
}
}
There was another class Processor
that takes SomeClass
objects, copies them (via someClassInstance.copy()
), manipulates the copy’s state, and returns the copy. Here it is:
class Processor{
public SomeClass processObject(SomeClass object){
SomeClass copy = object.copy();
manipulateTheCopy(copy);
return copy;
}
// irrelevant stuff omitted
}
I ran this, and it had bugs. I looked into these bugs, and it turned out that the manipulations Processor
does on copy
actually affect not only the copy, but also the original SomeClass
object that was passed into processObject
.
I found out that it was because the original and the copy shared state – because the original passed it’s field things
into the copy when creating it.
This made me realize that copying objects is harder than simply instantiating them with the same fields as the original.
For the two objects to be completely disconnected, without any shared state, each of the fields passed to the copy also has to be copied. And if that object contains other objects – they have to be copied too. And so on.
So basically, in order to be able to actually copy an object, each class in the system must have a copy()
method, that also invokes copy()
on all of it’s fields, and so on.
So for example, for copy()
in SomeClass to work, it needs to look like this:
public SomeClass copy() {
Thing[] copyThings = new Thing[things.length];
for(int i = 0; i < things.length; i++)
copyThings[i] = things[i].copy();
return new SomeClass(copyThings);
}
And if Thing
has object fields of it’s own, than it’s own copy()
method must be appropriate:
class Thing {
private Apple apple;
private Pencil pencil;
private int number;
public Thing(Apple apple, Pencil pencil, int number){
this.apple = apple;
this.pencil = pencil;
this.number = number;
}
public Thing copy(){
// 'number' is a primitve.
return new Thing(apple.getCopy(), pencil.getCopy(), number);
}
}
And so on.
Of course, instead of all classes having a copy()
method, the copying mechanism can happen in all of the getters and the constructors of classes (unless places where it isn’t suitable, for example when the field points to an external object, not to an object that ‘is part’ of this object).
Still, that means that in order to be able to safely copy an object – most classes would have to have copying mechanisms in their getters.
My question is divided into two parts:
-
How frequently do you need to get a copy of an object? Is this a regular issue?
-
Is the technique described common and/or reasonable? Or is there a better way to make safe copies of objects?
4
What you are describing is a well known and very basic issue in computer languages that have objects. It really isn’t very surprising. The ending was fairly obvious by about halfway through the story. It’s the difference between shallow copy and deep copy.
Objects tend to fall into two categories: those that represent values and those that have identity. Value objects are things like points and vectors, and they really need to be copied accurately. If they contain references to mutable objects you need a deep copy, but be wary. You may be digging a hole.
Identity objects are things like widgets, business objects and other complex entities. They usually don’t need to be copied. To be safe, it’s best to use language features to enforce that. [In C++ we use a private copy constructor.]
In answer to your questions:
How frequently do you need to get a copy of an object? Is this a regular issue?
Value objects commonly; identity objects rarely. It’s an issue you often have to think about, but not so often do anything special. If you are copying identity objects, could be a smell.
Is the technique described common and/or reasonable? Or is there a better way to make safe copies of objects?
Depends on the language, but in most languages it looks like overkill. Make sure your value objects copy safely, make sure your identity objects cannot be copied and handle any others on a case-by-case basis.
Value objects rarely contain references to other objects, and if they do it will only be a reference to either an immutable object (like a string) or to another value object. For example, a Rectangle object could contain two Point objects. This is one case where it makes sense to handle a copy of one object (Rectangle) by a nested call to the copy method of another object (Point). It is very much the exception.
Note that C# is quite helpful in that it has real value types. In Java and other languages where all objects are references, life is not so simple.
You may find yourself in a situation where you seem to need to copy an object that has identity, although it should not be common. You need to consider carefully the issues of identity for the copy (same or different), mutability (setters etc), whether the copy may replace the original, etc. It’s all case by case unfortunately.
Note also that value objects will normally calculate their hash value from their contents, where identity objects will normally calculate their hash value from a unique feature such as their address. That means they may not play well with hash-based collections like dictionaries.
6
This is an interesting question because it touches on some very important concepts which are often confused by novice and even intermediate programmers. Java tends to exacerbate this confusion somewhat due to the design of the language.
The big problem here is that you initially confused copying a reference with copying a value. What you thought you were going to do was copy the values, what you ended up doing was copying the references.
To explain the difference, imagine you have a tv set. You are watching a DVD on that TV. Someone else wants to watch the same program as you. When you copy by reference, you are giving them a tv and hooking that tv up to the DVD player. Whenever you decide to watch a different DVD, they get to see what you’re watching. If they change the DVD, you see what they’re watching.
When you are copying by value, you are giving them a TV, DVD player and a copy of the DVD. They can change the DVD to whatever they like and you don’t see it. Similarly, if you change the DVD they won’t see what you’re watching.
There are many times when you want to copy by value and there are many times where you want to copy by reference.
To work out what type of copying you want to do, you need to ask yourself the following questions:
- Why am I copying this object?
- For each of the properties of the object, do I want my copies to refer to the original property or a copy of the original copy?
- How will my program behave if I am copying by reference/value?
The answers to these questions will tell you how deeply you need to copy your objects.
1) I think it depends on the language, but yes, I copy things quite a bit. That said, I think it depends on the type of object immensely. Things I tend to copy a lot: plain-old-data structures, things like date/time or other objects that in my particular program are building blocks for everything else. If you think of object relationships as a tree, I typically copy only leaf nodes. (Think structs or blittable objects, in the old meaning of “blit”). Note that in many cases these leaf nodes are also good candidates for immutability (with a nod to @SJuan76). Things I don’t copy: controllers or services, anything with references to other objects that don’t fall into the leaf-node category. Your mileage may vary.
2) So then we get into why you copy. Unfortunately, the purpose of a copied object can be quite different from program to program or even within two parts of a program. The point is that it is quite hard to get a one-size-fits all solution because you wouldn’t be programming if one size fit all. There are at least two main approaches to this: shallow vs. deep cloning. @gnat brought up a great point about Java clone. In the same vein I would do some digging around .NET’s memberwise clone and find out its pros and cons. The only real rule I’ve found in programming using copies is that if you choose to copy something other than a leaf node, you run a greater risk of confusion about what the copy does – just like you found from your assignment.
2