I’ve been pondering for a while why Java and C# (and I’m sure other languages) default to reference equality for ==
.
In the programming I do (which certainly is only a small subset of programming problems), I almost always want logical equality when comparing objects instead of reference equality. I was trying to think of why both of these languages went this route instead of inverting it and having ==
be logical equality and using .ReferenceEquals()
for reference equality.
Obviously using reference equality is very simple to implement and it gives very consistent behavior, but it doesn’t seem like it fits well with most of the programming practices I see today.
I don’t wish to seem ignorant of the issues with trying to implement a logical comparison, and that it has to be implemented in every class. I also realize that these languages were designed a long time ago, but the general question stands.
Is there some major benefit of defaulting to this that I am simply missing, or does it seem reasonable that the default behavior should be logical equality, and defaulting back to reference equality it a logical equality doesn’t exist for the class?
5
C# does it because Java did. Java did because Java does not support operator overloading. Since value equality must be redefined for each class, it could not be an operator, but instead had to be a method. IMO this was a poor decision. It is much easier to both write and read a == b
than a.equals(b)
, and much more natural for programmers with C or C++ experience, but a == b
is almost always wrong. Bugs from the use of ==
where .equals
was required have wasted countless thousands of programmer hours.
7
The short answer: Consistency
To answer your question properly, though, I suggest we take a step backwards and look to the issue of what equality means in a programming language. There are at least THREE different possibilities, which are used in various languages:
- Reference equality: means that a = b is true if a and b refer to the same object. It would not be true if a and b referred to different objects, even if all the attributes of a and b were the same.
- Shallow equality: means that a = b is true if all the attributes of the objects to which a and b refer are identical. Shallow equality can easily be implemented by a bitwise comparison of the memory space that represents the two objects. Please note that reference equality implies shallow equality
- Deep equality: means that a = b is true if each attribute in a and b is either identical or deeply equal. Please note that deep equality is implied by both reference equality and shallow equality. In this sense, deep equality is the weakest form of equality and reference equality is the strongest.
These three types of equality are often used because they are convenient to implement: all three equality checks can easily be generated by a compiler (in the case of deep equality, the compiler might need to use tag bits to prevent infinite loops if a structure to be compared has circular references). But there is another problem: none of these might be appropriate.
In non-trivial systems, equality of objects is often defined as something between deep and reference equality. To check whether we want to regard two objects as equal in a certain context, we might require some attributes to be compared by where it stands in memory and others by deep equality, while some attributes may be allowed to be something different altogether. What we would really like is a “forth type of equality”, a really nice one, often called in the literature semantic equality. Things are equal if they are equal, in our domain. =)
So we can come back to your question:
Is there some major benefit of defaulting to this that I am simply missing, or does
it seem reasonable that the default behavior should be logical equality,
and defaulting back to reference equality if a logical equality doesn’t exist
for the class?
What do we mean when we write ‘a == b’ in any language? Ideally, it should always be the same: Semantic equality. But that´s not possible.
One of the main considerations is that, at least for simple types like numbers, we expect that two variables are equal after assignment of the same value. See below:
var a = 1;
var b = a;
if (a == b){
...
}
a = 3;
b = 3;
if (a == b) {
...
}
In this case, we expect that ‘a equals b’ in both statements. Anything else would be insane. Most (if not all) of the languages follow this convention. Therefore, with simple types (aka values) we know how to achieve semantic equality. With objects, that can be something completely different. See below:
var a = new Something(1);
var b = a;
if (a == b){
...
}
b = new Something(1);
a.DoSomething();
b.DoSomething();
if (a == b) {
...
}
We expect that the first ‘if’ will always be true. But what do you expect on the second ‘if’? It really depends. Can ‘DoSomething’ change the (semantic) equality of a and b?
The problem with semantic equality is that it cannot be automatically generated by the compiler for objects, nor it´s obvious from the assignments. A mechanism must be provided for the user to define semantic equality. In object-oriented languages, that mechanism is an inherited method: equals. Reading a piece of OO code, we don´t expect a method to have the same exact implementation in all classes. We are used to inheritance and overloading.
With operators, though, we expect the same behavior. When you see ‘a == b’ you should expect the same type of equality (from the 4 above) in all situations. So, aiming for consistency the languages designers used reference equality for all types. It should not depend on whether a programmer has overridden a method or not.
PS: The language Dee is slightly different from Java and C#: the equals operator means shallow equality for simple types and semantic equality for user-defined classes (with the responsibility for implementing the = operation lying with the user — no default is provided). As, for simple types, shallow equality is always semantic equality, the language is consistent. The price it pays, though, is that the equals operator is by default undefined for user-defined types. You have to implement it. And, sometimes, that´s just boring.
5
Some other answerers may be overthinking this.
==
compares two values. References are values. Objects are not.
When you write
int a = 5;
int b = 5;
System.out.println(a == b);
the variable a
contains the bits that make up the number 5, and so does the variable b
. They are equal.
When you write
String a = new String("foo");
String b = new String("foo");
System.out.println(a == b);
the variable a
contains a reference to one object, and the variable b
contains a reference to a different object. Those are not equal.
They could have designed the language so that a == b
would follow each reference and compare the objects instead of the references. In that case we would find different things to complain about. We’d be asking questions like “why does == sometimes dereference variables and sometimes not?” and “why does x == null
throw a NullPointerException
?” … unless they special-cased null
so it didn’t get dereferenced, and then we’d definitely be asking why it sometimes dereferenced variables and sometimes not. And some objects can’t even be compared – what does it mean to ask whether two sockets are equal?
Comparing the values is simple and is what they chose to go with, when they designed the language… even though it’s not consistent when you use objects to emulate values, all the other options are inconsistent too.
3
It’s just historical. You use == to compare values. int and double are values in C. What about char* ? C tends to see the pointer as the value, not the C string, especially since char* could point to a single char, not a string. Therefore with char* the pointer is the value to be compared with ==.
C++ is close enough to C that pointers to an instance of a class are the values compared with ==. p == q compared the two pointers. *p == *q compares the instances.
In a newer language (Swift) you don’t use pointers, except for interfacing with other languages. We know that a and b could reference the same object, but we see a and b as the values, so == compares a and b. At the same time we are usually aware that only references are actually stored. So we have another operator === which compares not the values, but the references to the values.
1
I think it’s time for an Update because Microsoft did learn from their mistake and implemented value equality as a default. First with F#
:
type Person = {
name: string
}
let p1 = { name = "Tom" }
let p2 = { name = "Tom2".Substring(0, 3) }
printfn "%A" (p1 = p2) // true
And more recently with C#
:
record Person(string Name);
var p1 = new Person("Tom");
var p2 = new Person("Tom2".Substring(0, 3));
Console.WriteLine(p1 == p2); // true
1
I was trying to think of why both of these languages went this route instead of inverting it and having == be logical equality and using .ReferenceEquals() for reference equality.
Because the latter approach would be confusing. Consider:
if (null.ReferenceEquals(null)) System.out.println("ok");
Should this code print "ok"
, or should it throw a NullPointerException
?
1
What does “logical equality” mean? Shallow equality? Probably not, because this suffers from the same problems as reference equality. Therefore, it must be deep equality. As a language designer, you have to make sure your implementation is as generic as possible. This would probably work as long as you are dealing with object graphs that are acyclic. How would you implement deep equality for a circular linked list (or simply object A referencing object B and vice versa)? How about a complex mesh of objects each referencing other objects in the mesh, forward and backward. This cannot be solved generically. You need to understand the semantics of the object graph.
For Java and C# the benefit lies in their being object oriented.
From a performance point of view – the easier to write code should also be quicker: since OOP intends for logically distinct elements to be represented by different objects, checking reference equality would be quicker, taking into consideration that objects can become quite large.
From a logical point of view – equality of an object to another does not have to be as obvious as comparing to object’s properties for equality (ex. how is null==null logically interpreted? this can differ from case to case).
I think what it boils down to, is your observation that “you always want logical equality over reference equality”. The consensus amongst the language designers was probably the opposite. I personally find it hard to evaluate this, since I lack the broad spectrum of programming experience. Roughly, I use reference equality more in optimisation algorithms, and logical equality more in handling data-sets.
3
.equals()
compares variables by their contents.
instead of ==
that compares the objects by their contents…
using objects is more accurate tu use .equals()
1