Recently I had a situation where it was logical to use varargs in java. I then found out that varargs are just syntactic sugar for arrays, and was curious about the performance repercussions that suggests. I read about it a little and found a couple of posts, and as one would safely assume, it’s slower than using normal arguments. I also came across methods like EnumSet.of(…) which have along the lines of 5 different overloads to avoid having to use varargs because of performance reasons. I am not here to talk about that though. What I am curious about is why they were implemented as arrays in the first place.
In C++ for example, you have parameter packs, which aren’t arrays. Is there some sort of historical reason for this?
There seems to be a bit of a discussion about performance going on. Again and I’d like to clarify: I haven’t benchmarked anything. I mentioned the performance problems because they seem to exist as even in the JDK, there is code that goes around using varargs in certain cases for performance. And also, performance isn’t the only limitation that roots from having varargs be implemented as an array. Another thing you cannot do with java varargs for example is forward the arguments to another function, or in other words – unpack them.
24
It boils down to the following:
- It keeps the language specification simple.
- It keeps the language implementation simple.
- It keeps user code (accessing varargs within a function) simple.
- It does not cost much, because memory allocation (i.e. the allocation of an array to hold parameters) is extremely fast in Java.
But most importantly, as I will show:
- The minuscule performance penalty of using an array for varargs is suffered only in varargs function invocations, while all other function invocations become slightly smaller and faster.
First of all, parameter packs did not exist in C++ back at the time that Java was laid down, so let us not compare apples with oranges: We have to compare java varargs with old C-style varargs.
Now, varargs are possible in C only under the default calling convention, which is cdecl. (If you try using some other calling convention, like stdcall, you cannot have varargs.)
Under the cdecl calling convention, any function can actually be invoked with any number of parameters, and this does not result in spectacular crashes all over the place, because it is the caller, who knows how many parameters they pushed into the stack, who is responsible for popping them from the stack once the function returns, to keep the stack balanced.
A cdecl function invocation under x86 looks like this:
push arg1 ;4 bytes
push arg2 ;4 bytes
call function
add sp, 8 ;balance the stack.
So, under the default calling convention in C, every single function invocation that is passed one or more parameters has to be followed with a stack-balancing instruction. This represents a performance penalty on all function invocations, regardless of whether they are varargs or not.
I have always thought that this is a bit retarded.
In Java, they decided that they are not going to have different calling conventions, and that the number of parameters to each function will be cast in stone. This way, they could make the function, rather than the caller, responsible for balancing the stack.
In x86, this is done like so:
ret 8 ;return, also popping 8 bytes.
This is a much more sensible language design choice. Unfortunately, it appears to mean that the language cannot have varargs.
Well, sure it can, by implementing them using an array.
This wheel has already been invented, so why not reuse it?
Thus, varargs invocations in Java might cost slightly more than the absolute bare minimum imaginable, but every single non-varargs invocation is slightly more optimal, so this is a clear winner.
3
To answer the question, I’ll present some analysis of choices, followed by my interpretation why the Java designers chose the array-based solution.
Analysis
“varargs” means that a call can provide a variable number of arguments that the method being called receives as one parameter entity, with properties like
- have strongly typed elements,
- can iterate over all individual arguments,
- can retrieve the individual arguments by index,
- or similar.
Whatever happens under the hood, the receiving parameter will look quite similar to an array or list. And of course, the compiler and the underlying JVM must have support for the mechanism chosen, either by reusing something that is already supported, or by extending it. So, what are the choices?
- In the core Java language and the underlying virtual machine, the array data type is supported and fits these requirements. It’s just the compiler that needs a special handling when a method with varargs exists that matches the call arguments list.
- Using something from the Collections framework with its types like e.g. ArrayList brings no relevant advantages (e.g. changing the length of the argument list is rarely useful), but forces boxing of primitive types like
int
intoInteger
. - Having the caller place the individual arguments onto the JVM stack, just like all “normal” arguments, without packing them into a heap-allocated array or list, needs some heavy extensions or modifications to the JVM. The method being called no longer knows how deep the stack is, making argument referenceing and unwinding the stack at method return more complex. And still the caller needs to receive an object that combines the arguments and supports access to its elements. This needs a JVM-supported special class just for that case.
Interpretation
Given that varargs are not used very often, the Java designers chose a mechanism that could re-use the existing JVM without special modifications. This is in line with many other Java extensions, that tried to keep the JVM as stable as possible.
Of the two remaining choices (array vs. List), the decision for array has less overhead and is superior in its element-type handling. E.g. there is a native int[]
, but no ArrayList<int>
, only ArrayList<Integer>
.
The varargs feature was introduced with Java 5, and I guess the Java team was already very busy with the other Java 5 features like generics, so investing into a major JVM modification just to support varargs was probably considered “not worth the effort”.
And there still is the Hotspot compiler that translates JVM bytecode into machine language. In that process, it applies lots of clever transformations. I wouldn’t be surprised if there’s a special detection of varags patterns that can be replaced by classic calls.