While programming in C#, I stumbled upon a strange language design decision that I just can’t understand.
So, C# (and the CLR) has two aggregate data types: struct
(value-type, stored on the stack, no inheritance) and class
(reference-type, stored on the heap, has inheritance).
This setup sounds nice at first, but then you stumble upon a method taking a parameter of an aggregate type, and to figure out if it is actually of a value type or of a reference type, you have to find its type’s declaration. It can get really confusing at times.
The generally accepted solution to the problem seems to be declaring all struct
s as “immutable” (setting their fields to readonly
) to prevent possible mistakes, limiting struct
s’ usefulness.
C++, for example, employs a much more usable model: it allows you to create an object instance either on the stack or on the heap and pass it by value or by reference (or by pointer). I keep hearing that C# was inspired by C++, and I just can’t understand why didn’t it take on this one technique. Combining class
and struct
into one construct with two different allocation options (heap and stack) and passing them around as values or (explicitly) as references via the ref
and out
keywords seems like a nice thing.
The question is, why did class
and struct
become separate concepts in C# and the CLR instead of one aggregate type with two allocation options?
7
The reason C# (and Java and essentially every other OO language developed after C++) did not copy C++’s model in this aspect is because the way C++ does it is a horrendous mess.
You correctly identified the relevant points above: struct
: value type, no inheritance. class
: reference type, has inheritance. Inheritance and value types (or more specifically, polymorphism and pass-by-value) don’t mix; if you pass an object of type Derived
to a method argument of type Base
, and then call a virtual method on it, the only way to get proper behavior is to ensure that what got passed was a reference.
Between that and all the other messes that you run into in C++ by having inheritable objects as value types (copy constructors and object slicing come to mind!) the best solution is to Just Say No.
Good language design isn’t just implementing features, it’s also knowing what features not to implement, and one of the best ways to do this is by learning from the mistakes of those who came before you.
16
By analogy, C# is basically like a set of mechanic’s tools where somebody has read that you should generally avoid pliers and adjustable wrenches, so it doesn’t include adjustable wrenches at all, and the pliers are locked in a special drawer marked “unsafe”, and can only be used with approval from a supervisor, after signing a disclaimer absolving your employer of any responsibility for your health.
C++, by comparison, not only includes adjustable wrenches and pliers, but some rather odd-ball special purpose tools whose purpose aren’t immediately apparent, and if you don’t know the right way to hold them, they might easily cut off your thumb (but once you understand how to use them, can do things that are essentially impossible with the basic tools in the C# toolbox). In addition, it has a lathe, milling machine, surface grinder, metal-cutting band-saw, etc., to let you design and create entirely new tools any time you feel the need (but yes, those machinist’s tools can and will cause serious injuries if you don’t know what you’re doing with them–or even if you just get careless).
That reflects the basic difference in philosophy: C++ attempts to give you all the tools you might need for essentially any design you might want. It makes almost no attempt at controlling how you use those tools, so it’s also easy to use them to produce designs that only work well in rare situations, as well as designs that are probably just a lousy idea and nobody knows of a situation in which they’re likely to work at all well. In particular, a great deal of this is done by decoupling design decisions–even those that in practice really are nearly always coupled. As a result, there’s a huge difference between just writing C++, and writing C++ well. To write C++ well, you need to know a lot of idioms and rules of thumb (including rules of thumb about how seriously to reconsider before breaking other rules of thumb). As a result, C++ is oriented much more toward ease of use (by experts) than ease of learning. There are also (all too many) circumstances in which it’s not really terribly easy to use either.
C# does a lot more to try to force (or at least extremely strongly suggest) what the language designers considered good design practices. Quite a few things that are decoupled in C++ (but usually go together in practice) are directly coupled in C#. It does allow for “unsafe” code to push the boundaries a little, but honestly, not a whole lot.
The result is that on one hand there are quite a few designs that can be expressed fairly directly in C++ that are substantially clumsier to express in C#. On the other hand, it’s a whole lot easier to learn C#, and the chances of producing a really horrible design that won’t work for your situation (or probably any other) are drastically reduced. In many (probably even most) cases, you can get a solid, workable design by simply “going with the flow”, so to speak. Or, as one of my friends (at least I like to think of him as a friend–not sure if he really agrees) likes to put it, C# makes it easy to fall into the pit of success.
So looking more specifically at the question of how class
and struct
got how they are in the two languages: objects created in an inheritance hierarchy where you might use an object of a derived class in the guise of its base class/interface, you’re pretty much stuck with the fact that you normally need to do so via some sort of pointer or reference–at a concrete level, what happens is that the object of the derived class contains something memory that can be treated as an instance of the base class/interface, and the derived object is manipulated via the address of that part of memory.
In C++, it’s up to the programmer to do that correctly–when he’s using inheritance, it’s up to him to ensure that (for example) a function that works with polymorphic classes in a hierarchy does so via a pointer or reference to the base class.
In C#, what is fundamentally the same separation between the types is much more explicit, and enforced by the language itself. The programmer doesn’t need to take any steps to pass an instance of a class by reference, because that’ll happen by default.
2
This is from “C#: Why Do We Need Another Language?” – Gunnerson, Eric:
Simplicity was an important design goal for C#.
It’s possible to go overboard on simplicity and language purity but
purity for purity’s sake is of little use to the professional
programmer. We therefore tried to balance our desire to have a simple
and concise language with solving the real-world problems that
programmers face.[…]
Value types, operator overloading and user-defined conversions all add
complexity to the language, but allow an important user scenario to be
tremendously simplified.
Reference semantics for objects is a way to avoid a lot of troubles (of course and not only object slicing) but real world problems can sometimes require objects with value semantic (e.g take a look at Sounds like I should never use reference semantics, right? for a different point of view).
What better approach to take, therefore, than segregate those dirty, ugly and bad objects-with-value-semantic under the tag of struct
?
3
Rather than thinking of value types deriving from Object
, it would be more helpful to think of storage-location types existing in an entirely separate universe from class instance types, but for every value type to have a corresponding heap-object type. A storage location of structure type simply holds a concatenation of the type’s public and private fields, and the heap type is auto-generated according to a pattern like:
// Defined structure
struct Point : IEquatable<Point>
{
public int X,Y;
public Point(int x, int y) { X=x; Y=y; }
public bool Equals(Point other) { return X==other.X && y==other.Y; }
public bool Equals(Object other)
{ return other != null && other.GetType()==typeof(this) && Equals(Point(other)); }
public bool ToString() { return String.Format("[{0},{1}", x, y); }
public bool GetHashCode() { return unchecked(x+y*65531); }
}
// Auto-generated class
class boxed_Point: IEquatable<Point>
{
public Point value; // Fake name; C++/CLI, though not C#, allow full access
public boxed_Point(Point v) { value=v; }
// Members chain to each member of the original
public bool Equals(Point other) { return value.Equals(other); }
public bool Equals(Object other) { return value.Equals(other); }
public String ToString() { return value.ToString(); }
public Int32 GetHashCode() { return value.GetHashCode(); }
}
and for a statement like:
Console.WriteLine(“The value is {0}”, somePoint);
to be translated as:
boxed_Point box1 = new boxed_Point(somePoint);
Console.WriteLine(“The value is {0}”, box1);
In practice, because storage location types and heap instance types exist in separate universes, it’s not necessary to call the heap-instance types things like like boxed_Int32
; since the system would know what contexts require the heap-object instance and which ones require a storage location.
Some people think that any value types which don’t behave like objects should be considered “evil”. I take the opposite view: since storage locations of value types are neither objects nor references to objects, the expectation that they should behave like objects should be considered unhelpful. In cases where a struct can usefully behave like an object, there’s nothing wrong with having one do so, but each struct
is at its heart nothing more than an aggregation of public and private fields stuck together with duct tape.