I’m developing a language like Vala and OOC that compiles back to C.
This means that, eventually, every feature needs to be adoptable to C code in some way or another. Generics is one of the features I’d like to implement in my language.
As you probably know, C is a strictly typed language. Except for the opaque void pointer, there is no way to pass an argument of a unknown type. This is because the compiler needs to know the exact size of the argument passed or returned.
The following solutions come to my mind:
- Boxing: Create a union that can store every possible type
- Pass a pointer to the parameter, rather than the parameter itself.
Both of these have their downsides:
-
- The size of the generic type
T
will be equal to the largest possible type - The sizes are predetermined, a struct with a different size cannot be passed by value
- Slower, because every argument has to be boxed/unboxed
- The size of the generic type
-
- The parameter is not copied onto the new stack scope
- Memory management issues because of the above
- Not the desired behaviour because of the above
I’m interested in knowing how I could adopt this high-level principle in a lower level language, and also how other high-level languages have conquered this problem.
EDIT
As @delnan has pointed out, another possibility is Monomorphization, creating a new function for every data type. This has a few of the same downsides:
- The type
T
needs to be defined beforehand (not very generic) - The binary size gets larger (which, granted, isn’t very relevant nowadays)
1
Monomorphization. For every generic (polymorphic) type/function, generate a non-generic (monomorphic) version for every set of type parameters. Given these declarations:
struct G<T> {
a: T,
b: T
}
fn get_a<T>(g: G<T>) -> T {
return g.a;
}
and this code:
x = G<int>(1, 2);
y = G<float>(1.0, 2.0);
get_a(x);
get_a(y);
Generate code equivalent to:
struct G_int {
a: int,
b: int
}
struct G_float {
a: float,
b: float
}
fn get_a_int(g: G_int) -> int {
return g.a;
}
fn get_a_float(g: G_float) -> float {
return g.a;
}
x = G_int(1, 2);
y = G_float(1.0, 2.0);
get_a_int(x);
get_a_float(y);
This also works across library barriers! It requires the code (at least in AST/IR form) of generics to be included with library binaries, but aside from that the compiler can simply generate the code in the same way, just taking the “template” from a different source. Duplicate definitions (when two libraries instantiate the same generic with the same type parameters) can be merged by the linker and at worst increase compile time.
5
currently there are 2 ways;
-
the type erasure method of java; essentially everything becomes a void* (or a custom type like a
struct base{void** functptrs;}
and gets casted to and fro as needed, this requires the type to follow a certain interface to let the template figure out if casts are valid at runtime. -
instantiation creates a new definition of the template like delnan explained; this is what is used in C++. It does cause some code bloat as you need to let the client recompile the template each time it is used.
4