Java: why do collections accept a Comparator but not (a hypothetical) Hasher and Equator?

This issue is most apparent when you have different implementations of an interface, and for the purposes of a particular collection you only care about the interface-level view of the objects. For example, suppose you had an interface like this:

public interface Person {
    int getId();
}

The usual way to implement hashcode() and equals() in implementing classes would have code like this in the equals method:

if (getClass() != other.getClass()) {
    return false;
}

This causes problems when you mix implementations of Person in a HashMap. If the HashMap only cares about the interface-level view of Person, then it could end up with duplicates that differ only in their implementing classes.

You could make this case work by using the same liberal equals() method for all the implementations, but then you run the risk of equals() doing the wrong thing in a different context (such as comparing two Persons that are backed by database records with version numbers).

My intuition tells me that equality should be defined per collection instead of per class. When using collections that rely on ordering, you can use a custom Comparator to pick the right ordering in each context. There is no analogue for hash-based collections. Why is this?

Just to clarify, this question is distinct from “Why is .compareTo() in an interface while .equals() is in a class in Java?” because it deals with the implementation of collections. compareTo() and equals()/hashcode() both suffer from the problem of universality when using collections: you can’t pick different comparison functions for different collections. So for the purposes of this question, the inheritance hierarchy of an object doesn’t matter at all; all that matters is whether the comparison function is defined per-object or per-collection.

This design is sometimes known as “Universal Equality”, it is the belief that whether two things are equal or not is a universal property.

What’s more, equality is a property of two objects, but in OO, you always call a method on one single object, and that object gets to solely decide how to handle that method call. So, in a design like Java’s, where equality is a property of one of the two objects being compared, it isn’t even possible to guarantee some basic properties of equality such as symmetry (a == b ⇔ b == a), because in the first case, the method is being called on a and in the second case it is being called on b and due to the basic principles of OO, it is solely a‘s decision (in the first case) or b‘s decision (in the second case) whether or not it considers itself equal to the other one. The only way to gain symmetry is to have the two objects cooperate, but if they don’t… tough luck.

One solution would be to make equality not a property of one object, but either a property of two objects, or a property of a third object. That latter option also solves the problem of universal equality, because if you make equality a property of a third “context” object, then you can imagine having different EqualityComparer objects for different contexts.

This is the design chosen for Haskell, for example, with the Eq typeclass. It is also the design chosen by some third-party Scala libraries (ScalaZ, for example), but not the Scala core or standard library, which uses universal equality for compatibility with the underlying host platform.

It is, interestingly, also the design chosen with Java’s Comparable/Comparator interfaces. The designers of Java clearly were aware of the problem, but for some reason only solved it for ordering, but not for equality (or hashing).

So, as to the question

why is there a Comparator interface but no Hasher and Equator?

the answer is “I don’t know”. Clearly, the designers of Java were aware of the problem, as evidenced by the existence of Comparator, but they obviously didn’t think it a problem for equality and hashing. Other languages and libraries make different choices.

The real answer to

why is there a Comparator interface but no Hasher and Equator?

is, quote courtesy of Josh Bloch:

The original Java APIs were done very quickly under a tight deadline to meet a closing market window. The original Java team did an incredible job, but not all of the APIs are perfect.

The problem lies solely in Java’s history, as with other similar matters, e.g. .clone() vs Cloneable.

tl;dr

it’s for historical reasons mainly; current behaviour/abstraction was introduced in JDK 1.0 and wasn’t fixed later on because it was virtually impossible to do so with maintaining backward code compatibility.

First, let’s sum up a couple of well-known Java facts:

Java, from the very beginning to the present day, was proudly backwards-compatible, requiring legacy APIs to be still supported in newer versions,
as such, almost each and every language construct introduced with JDK 1.0 lived to present day,
Hashtable, .hashCode() & .equals() were implemented in JDK 1.0, (Hashtable)
Comparable/Comparator was introduced in JDK 1.2 (Comparable),

Now, it follows:

it was virtually impossible & senseless to retrofit .hashCode() & .equals() to distinct interfaces while still maintaining backwards compatibility after the people realized there are better abstractions than putting them in superobject, because e.g. each and every one Java programmer by 1.2 knew that every Object has them, and they had to stay there physically to provide compiled code (JVM) compatibility also – and adding an explicit interface to every Object subclass that really implemented them would make this mess equal (sic!) to Clonable one (Bloch discusses why Cloneable sucks, also discussed in e.g. EJ 2nd and many other places, including SO),
they just left them there for the future generation to have a constant source of WTFs.

Now, you may ask “what does Hashtable have with all this”?

The answer is: hashCode() / equals() contract and not-so-good language design skills of core Java developers in 1995/1996.

Quote from Java 1.0 Language Spec, dated 1996 – 4.3.2 The Class Object, p.41:

The methods equals and hashCode are declared for the benefit of hashtables
such as java.util.Hashtable (§21.7). The method equals defines a notion
of object equality, which is based on value, not reference, comparison.

(note this exact statement has been changed in later versions, to say, quote: The method hashCode is very useful, together with the method equals, in hashtables such as java.util.HashMap., making it impossible to make the direct Hashtable–hashCode–equals connection without reading historical JLS!)

Java team decided they wanted a good dictionary-style collection, and they created Hashtable (good idea so far), but they wanted the programmer to be able to use it with as little code/learning curve as possible (oops! trouble incoming!) – and, since there was no generics yet [it’s JDK 1.0 after all], that would mean that either every Object put into Hashtable would have to explicitly implement some interface (and interfaces were still just in their inception back then… no Comparable yet even!), making this a deterrent to use it for many – or Object would have to implicitly implement some hashing method.

Obviously, they went with solution 2, for the reasons outlined above. Yup, now we know they were wrong. _{… it’s easy to be smart in hindsight. chuckle}

Now, hashCode() requires that every object having it has to have a distinct equals() method – so it was quite obvious that equals() had to be put in Object as well.

Since the default implementations of those methods on valid a&b Objects are essentially useless by being redundant (making a.equals(b) equal to a==b and a.hashCode() == b.hashCode() roughly equal to a==b also, unless hashCode and/or equals is overriden, or you GC hundreds of thousands of Objects during the lifecycle of your application¹), it’s safe to say they were provided mainly as a backup measure and for usage convenience. This is exactly how we get to the well-known fact that always override both .equals() & .hashCode() if you intend on actually comparing the objects or hash-storing them. Overriding only one of them without the other is a good way to screw your code (by wicked compare results or insanely high bucket collision values) – and getting your head around it is a source of constant confusion & errors for beginners (search SO to see it for yourself) and constant nuisance to more seasoned ones.

Also, note that although C# deals with equals & hashcode in a bit better way, Eric Lippert himself states that they did almost the same mistake with C# that Sun did with Java years before C#’s inception:

But why should it be the case that every object should be able to hash itself for insertion into a hash table? Seems like an odd thing to require every object to be able to do. I think if we were redesigning the type system from scratch today, hashing might be done differently, perhaps with an IHashable interface. But when the CLR type system was designed there were no generic types and therefore a general-purpose hash table needed to be able to store any object.

¹of course, Object#hashCode can still collide, but it takes a bit of effort to do that, see: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6809470 and linked bug reports for details; https://stackoverflow.com/questions/1381060/hashcode-uniqueness/1381114#1381114 covers this subject more in-depth.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 15:32

Thẻ: collections, java

Thiết kế website giá rẻ

Danh mục

Java: why do collections accept a Comparator but not (a hypothetical) Hasher and Equator?

tl;dr

it’s for historical reasons mainly; current behaviour/abstraction was introduced in JDK 1.0 and wasn’t fixed later on because it was virtually impossible to do so with maintaining backward code compatibility.