I am developing a dynamically typed, interpreted programming language, which is interpreted by a runtime written in Java. As Java is statically typed, I need to define how the numbers used in the language are stored in the interpreter.
My initial idea was to use BigDecimal
for all calculations, but this makes the code more complex and I assume has added overhead for the runtime. Another possibility is to parse the number, if it is a natural then use a long
and if it a real number use a double
.
So which solution is optimal, and why?
7
All the numeric types can represent different numbers and have different semantics. Stop thinking about implementation concerns for a second and do language design. What kind of numeric types should your language offer its users? (Note that most languages offer multiple types, and for good reasons.) Find the answer to that, then implement it.
3
Why not both?
In an interpreter for a dynamically typed language, you will need a common representation for all values. For example, we might have a function that can return either a number or a dictionary. In JavaScript:
function foo(want_dict) {
if (want_dict) return { value: 42 };
else return 42;
}
var result = foo(someVariable);
So, how could a dynamic language represent the contents of the result
variable? Since the variable could be either a number or a dictionary, the variable does not have a specific type. Instead, the type information lives in the value itself, often in form of a type ID. In C, a value might be represented as
typedef struct {
unsigned type_id;
void *value;
} Value;
In Java, we could use a similar class:
class Value {
private TypeId type;
private Object value;
public Value(TypeId type, Object value) {
this.type = type;
this.value = value;
}
public TypeId type() { return type; }
public Object value() { return value; }
}
We could then have various values such as
return new Value(TypeId.Int, new Integer(42));
...
HashMap<String, Value> rawValue = new HashMap<>();
rawValue.put("value", new Value(TypeId.Int, new Integer(42)));
return new Value(TypeId.Dict, rawValue);
When we try to use such a value, we first have to dynamically check that all values have acceptable types. E.g. the interpreter’s implementation of addition might look like this:
Value doAddition(Value a, Value b) {
if (a.type() != TypeId.Int)
throw ...;
if (b.type() != TypeId.Int)
throw ...;
return new Value(TypeId.Int, (Integer) a.value() + (Integer) b.value());
}
Alternatively, we could leverage the dynamic typing features of Java itself, i.e. type information is encoded in an inheritance hierarchy.
interface Value<T> {
public T value();
}
class MyInt implements Value<Integer> {
private int value;
public MyInt(int value) { this.value = value; }
@Override public Integer value() { return value; }
}
class MyDict implements Value<Map<String, Value>> {
private Map<String, Value> value;
public MyInt(Map<String, Value> value) { this.value = value; }
@Override public Map<String, Value> value() { return value; }
}
Now, we can create various values like
return new MyInt(42);
...
HashMap<String, Value> rawValue = new HashMap<>();
rawValue.put("value", new MyInt(42));
return new MyDict(rawValue);
With this encoding, an implementation of an addition operator would look like this:
Value doAddition(Value a, Value b) {
if (!(a instanceof MyInt))
throw ...;
if (!(b instanceof MyInt))
throw ...;
return new MyInt(a.value() + b.value());
}
So, both encodings look very much the same, but I would prefer an explicit tag since it’s a bit more flexible, and is not limited by Java’s type erasure (on the downside, everything is an Object
and needs to be cast).
With dynamic typing we necessarily have some kind of runtime type support available, so we can trivially support multiple numeric types. In one encoding, we just have to add a new type ID, in the other another subclass:
class MyRat implements Value<Double> {
private double value;
public MyRat(double value) { this.value = value; }
@Override public Double value() { return value; }
}
We can also make our implementation of addition polymorphic, so that the same operator can handle both integers and doubles:
Value doAddition(Value a, Value b) {
if ((a instanceof MyInt) && (b instanceof MyInt))
return new MyInt(a.value() + b.value());
if ((a instanceof MyRat) && (b instanceof MyRat))
return new MyRat(a.value() + b.value());
throw ...;
}
That wasn’t so hard!
The difficult part is deciding:
- Which numeric types do we want to support in our language? Only floating point numbers? (Javascript) Only integral types? (B) Both? (most languages) Both, but as a single type and we switch to doubles if the number would otherwise overflow? (Perl)
- Which sizes to we want to support? Native sizes for efficiency? Arbitrary-precision numbers? A collection of different width types? Do we want to support unsigned types?
Most of these problems can be answered when you think about what numbers will be used for in the language. E.g. doubles are useless when indexing an array, since not every natural number has a representation as a double. Integers are useless for non-discrete domains such as physical measurements. Doubles are useless for currency calculations (which are discrete). Integer overflow is useless, but efficient. And so on. As the other answer points out, this is ends up being a question of language design, but it’s not too difficult to implement any choice you arrive at.