I was working with a HashSet
the other day, which has this written in the spec:
[add()] adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2))
I was using char[]
in the HashSet
until I realized that, based on this contract, it was no better than an ArrayList
! Since it’s using the non-overridden .equals()
, my arrays will only be checked for reference equality, which is not particularly useful. I know that Arrays.equals()
exists, but that doesn’t help when one is using collections such as HashSet
.
So my question is, why would Java arrays not override equals?
7
There was a design decision to make early on in Java:
Are arrays primitives? or are they Objects?
The answer is, neither really… or both if you look at it another way. They work fairly closely with the system itself and the backend of the jvm.
One example of this is the java.lang.System.arraycopy() method which needs to take an array of any type. Thus, the array needs to be able to inherit something and thats an Object. And arraycopy is a native method.
Arrays are also funny in that they can hold primitives (int
, char
, double
, etc… while the other collections can only hold Objects. Look, for example, at java.util.Arrays and the ugly of the equals methods. This was put in as an after thought. deepEquals(Object[], Object[]) wasn’t added until 1.5 while the rest of the Arrays class was added in 1.2.
Because these objects are arrays, they let you do some things that are at the memory or near memory level – something that Java often hides from the coder. This allows certain things to be done faster at the expense of mostly breaking the object model.
There was a trade off early in the system between flexibility and some performance. Performance won out and the lack of flexibility was wrapped in the various collections. Arrays in Java are a thinly implemented Object on top of a primitive type (originally) intended for working with the system when you need it.
For the most part, raw arrays were things that it appears that the original designers tried to ignore and tuck away only in the system. And they wanted it to be fast (early Java had some issues with speed). It was a wart on the design that arrays aren’t nice Arrays, but its one that was needed when you wanted to expose something as close to the system as possible. For that matter, the contemporary languages of early Java also have this wart – one can’t do a .equals()
on C++’s array.
Java and C++ both took the same path for arrays – an external library that does the operations as needed on arrays rather than Arrays… and suggesting the coders to use better, native types unless they really know what they are doing and why they are doing it that way.
Thus, the approach implanting .equals in an array is wrong, but its the same wrong that coders coming from C++ knew of. So chose the least wrong thing in terms of the performance – leave it as the implementation of Object: two Objects are equal if and only if they are referring to the same object.
You need the array to be a primitive like structure for being able to communicate with native bindings – something as close to the classic C array as possible. But unlike the other primitives, you need the array to be able to be passed as a reference, and thus an Object. So its more of a primitive with some Object hacks on the side and some bounds checking.
4
In Java, arrays are pseudo-objects. Object references can hold arrays, and they do have the standard Object methods, but they are very lightweight compared to a true collection. Arrays do just enough to meet the contract of an Object and use the default implementations of equals
, hashCode
, and toString
quite deliberately.
Consider an Object[]
. An element of this array can be anything that fits in an object, which includes another array. It could be a boxed primitive, a socket, anything. What does equality mean in that case? Well, it depends on what is actually in the array. That is not something known in the general case when the language was being designed. Equality is defined both by the array itself as well as its contents.
This is the reason why there is an Arrays
helper class that has methods to compute equality (including deep equals), hash codes, etc. However, those methods are well-defined as far as what they do. If you need different functionality, write your own method to compare two arrays for equality based on the needs of your program.
While not strictly an answer to your question, I think it is relevant to say that you really should be using collections instead of arrays. Only convert to an array when interfacing with an API that requires arrays. Otherwise, collections offer better type safety, more well-defined contracts, and are generally easier to use than arrays.
9
The fundamental difficulty with arrays overriding equals
is that a variable of a type like int[]
may be used in at least three fundamentally different ways, and the meaning of equals
should vary depending upon usage. In particular, a field of type int[]
…
-
…may encapsulate a sequence of values in an array which will never be modified, but may be shared freely with code that won’t modify it.
-
…may encapsulate exclusive ownership of an integer-holding container which may be mutated at will by its owner.
-
…may identify an integer-holding container which some other entity is using to encapsulate its state, and thus serve as a connection to the state of that other entity.
If a class has an int[]
field foo
which is used for either of the first two purposes, then instances x
and y
should regard x.foo
and y.foo
as encapsulating the same state if they hold the same sequence of numbers; if the field is used for the third purpose, however, then x.foo
and y.foo
would only encapsulate the same state if they identify the same array [i.e. they’re reference equal]. If Java had included different types for the three usages above, and if equals
took a parameter identifying how the reference was being used, then it would have been appropriate for int[]
to use sequence equality for the first two usages and reference equality for the third. No such mechanism exists, however.
Note also that the int[]
case was the simplest kind of array. For arrays containing references to classes other than Object
or array types, there would be additional possibilities.
-
A reference to a sharable unchanging array which encapsulates things that will never change.
-
A reference to a sharable unchanging array which identifies things owned by other entities.
-
A reference to an exclusively-owned array which encapsulates references to things that will never change.
-
A reference to an exclusively-owned array which encapsulates references to exclusively-owned items.
-
A reference to an exclusively-owned array which identifies things owned by other entities.
-
A reference that identifies an array owned by some other entity.
In cases 1, 3, and 4, two array references should be considered equal if corresponding items are “value-equal”. In cases 2 and 5, two array references should be considered equal if they identify the same sequence of objects. In case 6, two array references should be considered equal only if they identify the same array.
For equals
to behave sensibly with aggregate types, they need to have some way of knowing how references will be used. Unfortunately, Java’s type system has no way of indicating that.
Overriding array equals()
and hashCode()
to depend on content would make them similar to collections – mutable types with non-constant hashCode()
. Types with changing hashCode()
behave poorly when stored in hash-tables and other applications relying on hashCode()
fixed value.
Set<List<Integer>> data = new HashSet<List<Integer>>();
List<Integer> datum = new ArrayList<Integer>();
datum.add(1);
data.add(datum);
assert data.contains(datum); // true
datum.add(2);
assert data.contains(datum); // false, WAT???
Arrays on other hand has trivial hashCode(), can be used as hash-table keys and are still mutable.
Set<int[]> data = new HashSet<int[]>(67);
int[] datum = new int[]{1, 2};
data.add(datum);
System.out.println(data.contains(datum)); //true
datum[0] = 78;
System.out.println(data.contains(datum)); //true
//PROFIT!!!
7