In Java specifically, but likely in other languages as well: when would it be useful to have two references to the same object?
Example:
Dog a = new Dog();
Dob b = a;
Is there a situation where this would be useful? Why would this be a preferred solution to using a
whenever you want to interact with the object represented by a
?
3
An example is when you want to have the same object in two separate lists:
Dog myDog = new Dog();
List dogsWithRabies = new ArrayList();
List dogsThatCanPlayPiano = new ArrayList();
dogsWithRabies.add(myDog);
dogsThatCanPlayPiano.add(myDog);
// Now each List has a reference to the same dog
Another use is when you have the same object playing several roles:
Person p = new Person("Bruce Wayne");
Person batman = p;
Person ceoOfWayneIndustries = p;
7
That’s actually a surprisingly profound question! Experience from modern C++ (and languages that take from modern C++, such as Rust) suggests that very often, you don’t want that! For most data, you want a single or unique (“owning”) reference. This idea is also the one main reason for linear type systems.
However, even then you typically want some short-lived “borrowed” references that are used to access the memory briefly but don’t last for the a significant fraction of the time the data exists. Most commonly when you pass an object as argument to a different function (Parameters are variables too!):
void encounter(Dog a) {
hissAt(a);
}
void hissAt(Dog b) {
// ...
}
A less common case when you use one of two objects depending on a condition, doing essentially the same thing regardless of which you choose:
Dog a, b;
Dog goodBoy = whoseAGoodBoy ? a : b;
feed(goodBoy);
walk(goodBoy);
pet(goodBoy);
Going back to more common uses, but leaving local variables behind, we turn to fields: For example, widgets in GUI frameworks often have parent widgets, so your big frame containing ten buttons would have at least ten references pointing at it (plus some more from its parent and perhaps from event listeners and so on). Any kind of object graph, and some kinds of object trees (those with parent/sibling references), have multiple objects refer to each the same object. And virtually every data set is actually a graph 😉
4
Temporary variables: consider the following pseudocode.
Object getMaximum(Collection objects) {
Object max = null;
for (Object candidate IN objects) {
if ((max is null) OR (candidate > max)) {
max = candidate;
}
}
return max;
}
The variables max
and candidate
may point to the same object, but the variable assignment changes using different rules and at different times.
To supplement the other answers, you may also want to traverse a data structure differently, starting from the same place. For example, if you had a BinaryTree a = new BinaryTree(...); BinaryTree b = a
, you could traverse down the tree’s leftmost path with a
and its rightmost path with b
, using something like:
while (!a.equals(null) && !b.equals(null)) {
a = a.left();
b = b.right();
}
It’s been a while since I’ve written Java, so that code may not be correct or sensible. Take it more as pseudocode.
This method is great when you have several objects that all call back to another object that can be used uncontextually.
For instance, if you have a tabbed interface you may have Tab1, Tab2, and Tab3. You may also want to be able to use a common variable regardless of which tab the user is on to simplify your code and reduce having to figure out on the fly over and over which tab your user is on.
Tab Tab1 = new Tab();
Tab Tab2 = new Tab();
Tab Tab3 = new Tab();
Tab CurrentTab = new Tab();
Then, in each of the numbered tabs onClick, you could change CurrentTab to reference that Tab.
CurrentTab = Tab3;
Now in your code you can call “CurrentTab” with impunity without needing to know which Tab you are actually on. You can also update properties of CurrentTab and they will automatically flow down to the referenced Tab.
There are plenty of scenarios wherein b
must be a reference to an unknown “a
” in order to be useful. In particular:
- Any time you don’t know what
b
points to at compile-time. - Any time you need to iterate over a collection, whether known at compile time or not
- Any time you have limited scope
For example:
Parameters
public void DoSomething(Thing &t) {
}
t
is a reference to a variable from an outside scope.
Return values and other conditional values
Thing a = Thing.Get("a");
Thing b = Thing.Get("b");
Thing biggerThing = Thing.max(a, b);
Thing z = a.IsMoreZThan(b) ? a : b;
biggerThing
and z
are each references to either a
or b
. We don’t know which at compile-time.
Lambdas and their return values
Thing t = someCollection.FirstOrDefault(x => x.whatever > 123);
x
is a parameter (example 1 above), and t
is a return value (example 2 above)
Collections
indexByName.add(t.name, t);
process(indexByName["some name"]);
index["some name"]
is, to a large extent, a more sophisticated looking b
. It’s an alias to an object that was created and stuffed into the collection.
Loops
foreach (Thing t in things) {
/* `t` is a reference to a thing in a collection */
}
t
is a reference to an item returned (example 2) by an iterator (previous example).
2
It is a crucial point, but IMHO is worth understanding.
All OO languages always make copies of references, and never copy an object ‘invisibly’. It would be much harder to write programs if OO languages worked any other way. For example, functions, and methods, could never update an object. Java, and most OO languages would be almost impossible to use without significant added complexity.
An object in a program is supposed to have some meaning. For example it represents something specific in the real physical world. It usually makes sense to have many references to the same thing. For example, my home address can be given to many people and organisations, and that address always refers to the same physical location. So the first point is, objects often represent something which is specific, real, or concrete; and so being able to have many references to the same thing is extremely useful. Otherwise it would be harder to write programs.
Every time you pass a
as an argument/parameter to another function e.g. calling
foo(Dog aDoggy);
or apply a method to a
, the underlying program code makes a copy of the reference, to produce a second reference to the same object.
Further, if code with a copied reference is in a different thread, then both can be used concurrently to access the same object.
So in most useful programs, there will be multiple references to the same object, because that is the semantics of most OO programming languages.
Now, if we think about it, because passing by reference is the only mechanism available in many OO languages (C++ supports both), we might expect it to be the ‘right’ default behaviour.
IMHO, using references is the right default, for a couple of reasons:
- It guarantees that the value of an object used in two different places is the same. Imagine putting an object into two different data structures (arrays, lists etc.), and doing some operations on an object that changes it. That could be a nightmare to debug. More importantly, it is the same object in both data structures, or the program has a bug.
- You can happily refactor code into several functions, or merge the code from several functions into one, and the semantics do not change. If the language did not provide reference semantics, it would be even more complex to modify code.
There is also an efficiency argument; making copies of entire objects is less efficient than copying a reference. However, I think that misses the point. Multiple references to the same object make more sense, and are easier to use, because they match the semantics of the real physical world.
So, IMHO, it usually makes sense to have multiple references to the same object. In the unusual cases where that doesn’t make sense in the context of an algorithm, most languages provide the ability to make a ‘clone’ or deep copy. However that is not the default.
I think people who argue that this should not be the default are using a language which does not provide automatic garbage collection. For example, old fashioned C++. The issue there is that they need to find a way to collect ‘dead’ objects and not reclaim objects that may still be required; having multiple references to the same object makes that hard.
I think, if C++ had sufficiently low-cost garbage collection, so that all referenced objects are garbage collected, then much of the objection goes away. There will still be some cases where reference semantics is not what is needed. However, in my experience, the people who can identify those situations are also usually capable of choosing the appropriate semantics anyway.
I believe there is some evidence that a large amount of the code in a C++ program is there to handle or mitigate garbage collection. However, writing, and maintaining that sort of ‘infrastructural’ code adds cost; it is there to make the language easier to use, or more robust. So, for example the Go language is designed with a focus on remediating some of the weaknesses of C++, and it has no choice except garbage collection.
This is of course irrelevant in the context of Java. It too was designed to be easy to use, and so has garbage collection. Hence having multiple references is the default semantics, and is relatively safe in the sense that objects aren’t reclaimed while there is a reference to them. Of course they might be held onto by a data structure because the program doesn’t properly tidy-up when it has really finished with an object.
So, circling back around to your question (with a bit of generalisation), when would you want more than one references to the same object? Pretty much in every situation I can think of. They are the default semantics of most languages parameter passing mechanism. I suggest that is because the default semantics of handling objects which exists in the real world pretty much has to be by reference (‘cos the actual objects are out there).
Any other semantics would be harder to handle.
Dog a = new Dog("rover"); // initialise with name
DogList dl = new DogList()
dl.add(a)
...
a.setOwner("Mr Been")
I suggest that the “rover” in dl
should be the one effected by setOwner
or programs get hard to write, understand, debug or modify. I think most programmers would be puzzled or dismayed otherwise.
later, the dog is sold:
soldDog = dl.lookupOwner("rover", "Mr Been")
soldDog.setOwner("Mr Mcgoo")
This sort of processing is common and normal. So reference semantics are the default because it usually makes most sense.
Summary: It always makes sense to have multiple references to the same object.
2
Of course, one other scenario where you might end up with:
Dog a = new Dog();
Dog b = a;
is when you’re maintaining code and b
used to be a different dog, or a different class, but now is serviced by a
.
Generally, in the medium term, you should rework all the code to refer to a
directly, but that may not happen straight away.
You’d want this anytime your program has a chance of pulling an entity into memory in more than one spot, possibly because different components are using it.
Identity Maps provided a definitive local store of entities so that we can avoid having two or more separate representations. When we represent the same object twice, our client runs the risk of causing a concurrency issue if one reference of the object persists its state changes before the other other instance does. The idea is we want to ensure that our client is always dealing the definitive reference to our entity/object.
I used this when writing a Sudoku solver. When I know the number of a cell when processing rows, I want the containing column to know that cell’s number too when processing columns. So both columns and rows are arrays of Cell objects that overlap. Exactly like the accepted answer showed.
In web applications, Object Relational Mappers can use lazy loading so that all references to the same database object (at least within the same thread) point to the same thing.
For instance, if you had two tables:
Dogs:
- id | owner_id | name
- 1 | 1 | Bark Kent
Owners:
- id | name
- 1 | me
- 2 | you
There are several approaches that your ORM can do if the following calls are made:
dog = Dog.find(1) // fetch1
owner = Owner.find(1) // fetch2
superdog = owner.dogs.first() // fetch3
superdog.name = "Superdog"
superdog.save! // write1
owner = dog.owner // fetch4
owner.name = "Mark"
owner.save! // write2
dog.owner = Owner.find(2)
dog.save! // write3
In the naive strategy, all calls to the models and the related references retrieve separate objects. Dog.find()
, Owner.find()
, owner.dogs
, and dog.owner
result in a database hit the first time around, after which they are saved to memory. And so:
- database is fetched from at least 4 times
- dog.owner is not the same as superdog.owner (fetched separately)
- dog.name is not the same as superdog.name
- dog and superdog both attempt to write to the same row and will overwrite each other’s results: write3 will undo the name change in write1.
Without a reference, you have more fetches, use more memory, and introduce the possibility of overwriting earlier updates.
Suppose that your ORM knows that all references to row 1 of the dogs table should point to the same thing. Then:
- fetch4 can be eliminated as there is an object in memory corresponding to Owner.find(1). fetch3 will still result in at least an index scan, as there might be other dogs owned by owner, but won’t trigger a row retrieval.
- dog and superdog point to the same object.
- dog.name and superdog.name point to the same object.
- dog.owner and superdog.owner point to the same object.
- write3 does not overwrite the change in write1.
In other words, using references helps codify the principle of a single point of truth (at least within that thread) being the row in the database.