Is boxing of primitives required in OO languages to keep them consistent with the rest of the object system (generics etc.)?
Or is it avoidable – is it possible to avoid any additional performance cost of having both primitives and objects in a language?
One solution I can come up on the spot is having references big enough to store values of every possible primitive type.
Are there other (better) solutions and are there implemented in popular languages?
3
Because of the way processors are architected, you need boxing at some level in order to get both reasonable efficiency and a unified type model. However, the boxing doesn’t need to be manually specified by the programmer, and in some languages it is handled automatically behind the scenes for you.
Take Scala, for example. Int is derived from an AnyVal
, which is derived from an Any
, which is Scala’s top-level class. Syntactically, you can treat it like any other object, but the compiler will treat it like a primitive in appropriate contexts, internally doing boxing and unboxing as necessary. The point is, the programmer doesn’t have to care. Even nicer, in Scala this is implementing using implicits, so programmers can seamlessly implement their own custom automatic boxing and unboxing if the built-in ones aren’t sufficient.
If you’re not fortunate enough to be using a language like Scala, generics can obviate the need for manual boxing in many situations.
There are various approaches taken by other languages, that avoid having separate “boxed” and “unboxed” kinds of values.
-
In Python, all values (from integers to objects) act the same when used as references. It might feel like simple values act differently, but that’s because type such as
int
andstr
objects are immutable. A similar approach is taken by Ruby and most other popular “scripting” type languages. -
In Lisp, a value can be either a number or a reference to a cons cell. Some implementations combine these into a single machine word by reserving one or two of the high bits in the word to indicate the type of the value. For example, a 0 in the high bit might mean an integer value, but a 1 means the address of a cons cell. (There can be other adjustments applied such as shifting the address value left a couple of bits to access the whole address space, possible because the LSB of address bits are always 0 due to alignment constraints. All of this is highly implementation dependent.)
-
In C++, the generics mechanism (templates) allows you to write generic code that can handle primitive types such as
int
as well as polymorphic pointers. The underlying mechanism actually compiles the generic code more than once depending on the actual type with which the generic template is instantiated.
No. Boxing is required only when dealing with an “object” i.e. where the type is unknown. Given generics without type erasure, there is little to no reason to do so.
The reason why Java and .Net have boxing is because generics were tacked on later. The reason why Java does more boxing than .Net is a consequences of Java doing more boxing than was necessary and partly how they decided to implement generics.
Also “primitive types” are themselves a performance/size optimization, as such it is certainly conceivable of a system where that optimization was considered unnecessary.
The only use case I can think of for boxing in .net given generics is the Tag on controls — i.e. a class exposing a property for users of the class, and not for use IN the class. And even there, boxing could be avoided by requiring that it be a non-primitive class, a bit of extra work, but not unreasonably so.
2
You’re asking about primitive types and objects, but I don’t think that’s a useful distinction here. Instead, you should be thinking about reference types and value types:
- Value types: always exactly of the specified type (can’t be a derived type instead); can’t use virtual function dispatch; lifetime tied to the scope of the variable (usually allocated on the stack)
- Reference types: can be the specified type or a derived type; can use virtual function dispatch; lifetime not tied to the scope of the variable (usually allocated on the heap)
If you look at these differences, you realize that both kind of types have some merit: value types are more performant, while reference types are more flexible, especially if you want to use OOP features like inheritance or virtual functions.
This is why many languages (including C++, C# and Java) offer both of them in one form or another (though the form varies widely).
Now we have two kinds of types, but we would also like to have a unified type system. And that means having a type, where variables of this type can contain values of any type. This requirement means that the type (called Object
in C# and Java) has to be a reference type. And to convert a value type to this Object
type, you have to “box” it: create a copy of the value that acts like a reference type.
To sum up: a language has to support boxing, if you want to have reference types, value types and a unified type system.
3
Depends on the language.
In C# 1.0/1.1, boxing was unavoidable quite often – when a method has an object
parameter and one tried to pass in a value type (think primitive type, though not exactly), it had to be boxed in order to be passed in as a reference type.
With C# 2 and generics support, most such boxing went away, as a generic type could be used.
VB6 included “variant” types which could store primitives or object references, so it’s certainly possible. The semantics of VB6 variant types were absolutely horrible, and I would not suggest any language try to emulate them.
Otherwise, while it might in some rare cases be useful to define an aggregate type which holds an object reference and something like an int64 whose bits could be interpreted as needed to represent any primitive type, one generally can’t do anything useful with a variant type unless one knows what it’s supposed to be, and if one knows what a type is supposed to be one doesn’t really need a variant. In a framework like .NET which has “real” generic types, boxing is seldom needed and in those contexts where it is needed, it would probably be more helpful to have all types receive a layer of boxing (so that a reference of type Animal
which identifies a Cat
, would be boxed as “Boxed reference of type Animal which identifies a Cat”).