This issue is most apparent when you have different implementations of an interface, and for the purposes of a particular collection you only care about the interface-level view of the objects. For example, suppose you had an interface like this:
public interface Person {
int getId();
}
The usual way to implement hashcode()
and equals()
in implementing classes would have code like this in the equals
method:
if (getClass() != other.getClass()) {
return false;
}
This causes problems when you mix implementations of Person
in a HashMap
. If the HashMap
only cares about the interface-level view of Person
, then it could end up with duplicates that differ only in their implementing classes.
You could make this case work by using the same liberal equals()
method for all the implementations, but then you run the risk of equals()
doing the wrong thing in a different context (such as comparing two Person
s that are backed by database records with version numbers).
My intuition tells me that equality should be defined per collection instead of per class. When using collections that rely on ordering, you can use a custom Comparator
to pick the right ordering in each context. There is no analogue for hash-based collections. Why is this?
Just to clarify, this question is distinct from “Why is .compareTo() in an interface while .equals() is in a class in Java?” because it deals with the implementation of collections. compareTo()
and equals()
/hashcode()
both suffer from the problem of universality when using collections: you can’t pick different comparison functions for different collections. So for the purposes of this question, the inheritance hierarchy of an object doesn’t matter at all; all that matters is whether the comparison function is defined per-object or per-collection.
3
This design is sometimes known as “Universal Equality”, it is the belief that whether two things are equal or not is a universal property.
What’s more, equality is a property of two objects, but in OO, you always call a method on one single object, and that object gets to solely decide how to handle that method call. So, in a design like Java’s, where equality is a property of one of the two objects being compared, it isn’t even possible to guarantee some basic properties of equality such as symmetry (a == b
⇔ b == a
), because in the first case, the method is being called on a
and in the second case it is being called on b
and due to the basic principles of OO, it is solely a
‘s decision (in the first case) or b
‘s decision (in the second case) whether or not it considers itself equal to the other one. The only way to gain symmetry is to have the two objects cooperate, but if they don’t… tough luck.
One solution would be to make equality not a property of one object, but either a property of two objects, or a property of a third object. That latter option also solves the problem of universal equality, because if you make equality a property of a third “context” object, then you can imagine having different EqualityComparer
objects for different contexts.
This is the design chosen for Haskell, for example, with the Eq
typeclass. It is also the design chosen by some third-party Scala libraries (ScalaZ, for example), but not the Scala core or standard library, which uses universal equality for compatibility with the underlying host platform.
It is, interestingly, also the design chosen with Java’s Comparable
/Comparator
interfaces. The designers of Java clearly were aware of the problem, but for some reason only solved it for ordering, but not for equality (or hashing).
So, as to the question
why is there a
Comparator
interface but noHasher
andEquator
?
the answer is “I don’t know”. Clearly, the designers of Java were aware of the problem, as evidenced by the existence of Comparator
, but they obviously didn’t think it a problem for equality and hashing. Other languages and libraries make different choices.
4
The real answer to
why is there a
Comparator
interface but noHasher
andEquator
?
is, quote courtesy of Josh Bloch:
The original Java APIs were done very quickly under a tight deadline to meet a closing market window. The original Java team did an incredible job, but not all of the APIs are perfect.
The problem lies solely in Java’s history, as with other similar matters, e.g. .clone()
vs Cloneable
.
tl;dr
it’s for historical reasons mainly; current behaviour/abstraction was introduced in JDK 1.0 and wasn’t fixed later on because it was virtually impossible to do so with maintaining backward code compatibility.
First, let’s sum up a couple of well-known Java facts:
- Java, from the very beginning to the present day, was proudly backwards-compatible, requiring legacy APIs to be still supported in newer versions,
- as such, almost each and every language construct introduced with JDK 1.0 lived to present day,
Hashtable
,.hashCode()
&.equals()
were implemented in JDK 1.0, (Hashtable)Comparable
/Comparator
was introduced in JDK 1.2 (Comparable),
Now, it follows:
- it was virtually impossible & senseless to retrofit
.hashCode()
&.equals()
to distinct interfaces while still maintaining backwards compatibility after the people realized there are better abstractions than putting them in superobject, because e.g. each and every one Java programmer by 1.2 knew that everyObject
has them, and they had to stay there physically to provide compiled code (JVM) compatibility also – and adding an explicit interface to everyObject
subclass that really implemented them would make this mess equal (sic!) toClonable
one (Bloch discusses why Cloneable sucks, also discussed in e.g. EJ 2nd and many other places, including SO), - they just left them there for the future generation to have a constant source of WTFs.
Now, you may ask “what does Hashtable
have with all this”?
The answer is: hashCode()
/ equals()
contract and not-so-good language design skills of core Java developers in 1995/1996.
Quote from Java 1.0 Language Spec, dated 1996 – 4.3.2 The Class Object
, p.41:
The methods
equals
andhashCode
are declared for the benefit of hashtables
such asjava.util.Hashtable
(§21.7). The method equals defines a notion
of object equality, which is based on value, not reference, comparison.
(note this exact statement has been changed in later versions, to say, quote: The method hashCode is very useful, together with the method equals, in hashtables such as java.util.HashMap.
, making it impossible to make the direct Hashtable
–hashCode
–equals
connection without reading historical JLS!)
Java team decided they wanted a good dictionary-style collection, and they created Hashtable
(good idea so far), but they wanted the programmer to be able to use it with as little code/learning curve as possible (oops! trouble incoming!) – and, since there was no generics yet [it’s JDK 1.0 after all], that would mean that either every Object
put into Hashtable
would have to explicitly implement some interface (and interfaces were still just in their inception back then… no Comparable
yet even!), making this a deterrent to use it for many – or Object
would have to implicitly implement some hashing method.
Obviously, they went with solution 2, for the reasons outlined above. Yup, now we know they were wrong. … it’s easy to be smart in hindsight. chuckle
Now, hashCode()
requires that every object having it has to have a distinct equals()
method – so it was quite obvious that equals()
had to be put in Object
as well.
Since the default implementations of those methods on valid a
&b
Object
s are essentially useless by being redundant (making a.equals(b)
equal to a==b
and a.hashCode() == b.hashCode()
roughly equal to a==b
also, unless hashCode
and/or equals
is overriden, or you GC hundreds of thousands of Object
s during the lifecycle of your application1), it’s safe to say they were provided mainly as a backup measure and for usage convenience. This is exactly how we get to the well-known fact that always override both .equals()
& .hashCode()
if you intend on actually comparing the objects or hash-storing them. Overriding only one of them without the other is a good way to screw your code (by wicked compare results or insanely high bucket collision values) – and getting your head around it is a source of constant confusion & errors for beginners (search SO to see it for yourself) and constant nuisance to more seasoned ones.
Also, note that although C# deals with equals & hashcode in a bit better way, Eric Lippert himself states that they did almost the same mistake with C# that Sun did with Java years before C#’s inception:
But why should it be the case that every object should be able to hash itself for insertion into a hash table? Seems like an odd thing to require every object to be able to do. I think if we were redesigning the type system from scratch today, hashing might be done differently, perhaps with an
IHashable
interface. But when the CLR type system was designed there were no generic types and therefore a general-purpose hash table needed to be able to store any object.
1of course, Object#hashCode
can still collide, but it takes a bit of effort to do that, see: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6809470 and linked bug reports for details; https://stackoverflow.com/questions/1381060/hashcode-uniqueness/1381114#1381114 covers this subject more in-depth.
6