There are some problems which are easily solved by Algebraic Data Types, for example a List type can be very succinctly expressed as:
data ConsList a = Empty | ConsCell a (ConsList a)
consmap f Empty = Empty
consmap f (ConsCell a b) = ConsCell (f a) (consmap f b)
l = ConsCell 1 (ConsCell 2 (ConsCell 3 Empty))
consmap (+1) l
This particular example is in Haskell, but it would be similar in other languages with native support for Algebraic Data Types.
It turns out that there is an obvious mapping to OO-style subtyping: the datatype becomes an abstract base class and every data constructor becomes a concrete subclass. Here’s an example in Scala:
sealed abstract class ConsList[+T] {
def map[U](f: T => U): ConsList[U]
}
object Empty extends ConsList[Nothing] {
override def map[U](f: Nothing => U) = this
}
final class ConsCell[T](first: T, rest: ConsList[T]) extends ConsList[T] {
override def map[U](f: T => U) = new ConsCell(f(first), rest.map(f))
}
val l = new ConsCell(1, new ConsCell(2, new ConsCell(3, Empty)))
l.map(1+)
The only thing needed beyond naive subclassing is a way to seal classes, i.e. a way to make it impossible to add subclasses to a hierarchy.
How would you approach this problem in a language like C# or Java? The two stumbling blocks I found when trying to use Algebraic Data Types in C# were:
- I couldn’t figure out what the bottom type is called in C# (i.e. I couldn’t figure out what to put into
class Empty : ConsList< ??? >
) - I couldn’t figure out a way to seal
ConsList
so that no subclasses can be added to the hierarchy
What would be the most idiomatic way to implement Algebraic Data Types in C# and/or Java? Or, if it isn’t possible, what would be the idiomatic replacement?
16
All of the other answers here are outdated (as of me writing this answer).
Java has first class support for Abstract Data Types (ADT’s) now. The easiest way to create an ADT is by using Records
and Sealed Types
. Both of these are features that released in Java 16 and 17, respectively.
Here is an example that uses them, plus a little bit of pattern-matching, switch expressions, var, and lambdas, to be able to emulate the original example shown in the question.
import java.util.function.UnaryOperator;
public class abc
{
sealed interface ConsList<A> permits Empty, ConsCell {}
record Empty<A>() implements ConsList<A> {}
record ConsCell<A>(A head, ConsList<A> tail) implements ConsList<A> {}
public static <A> ConsList<A> applyFunctionToConsList(final UnaryOperator<A> function, final ConsList<A> list)
{
return
switch (list)
{
case Empty<A> e -> e;
case ConsCell<A>(var head, var tail) -> new ConsCell<>(function.apply(head), applyFunctionToConsList(function, tail));
};
}
public static void main(String[] args)
{
var list = new ConsCell<>(1, new ConsCell<>(2, new ConsCell<>(3, new Empty<>())));
System.out.println(applyFunctionToConsList(i -> i + 1, list));
}
}
If I run the above code, I get the following output.
ConsCell[head=2, tail=ConsCell[head=3, tail=ConsCell[head=4, tail=Empty[]]]]
Here is another example.
var list = new ConsCell<>(1, new ConsCell<>(2, new ConsCell<>(3, new Empty<>())));
System.out.println(applyFunctionToConsList(i -> i * 3, list));
//ConsCell[head=3, tail=ConsCell[head=6, tail=ConsCell[head=9, tail=Empty[]]]]
2
There is an easy, but boilerplate heavy way to seal classes in Java. You put a private constructor in the base class then make subclasses inner classes of it.
public abstract class List<A> {
// private constructor is uncallable by any sublclasses except inner classes
private List() {
}
public static final class Nil<A> extends List<A> {
}
public static final class Cons<A> extends List<A> {
public final A head;
public final List<A> tail;
public Cons(A head, List<A> tail) {
this.head = head;
this.tail = tail;
}
}
}
Tack on a visitor pattern for dispatch.
My project jADT: Java Algebraic DataTypes generates all that boilerplate for you.
4
You can achieve this by using the visitor pattern, which will supplement pattern matching. For example
data List a = Nil | Cons { value :: a, sublist :: List a }
can be written in Java as
interface List<T> {
public <R> R accept(Visitor<T,R> visitor);
public static interface Visitor<T,R> {
public R visitNil();
public R visitCons(T value, List<T> sublist);
}
}
final class Nil<T> implements List<T> {
public Nil() { }
public <R> R accept(Visitor<T,R> visitor) {
return visitor.visitNil();
}
}
final class Cons<T> implements List<T> {
public final T value;
public final List<T> sublist;
public Cons(T value, List<T> sublist) {
this.value = value;
this.sublist = sublist;
}
public <R> R accept(Visitor<T,R> visitor) {
return visitor.visitCons(value, sublist);
}
}
Sealing is achieved by the Visitor
class. Each of its methods declares how to deconstruct one of the subclasses. You could add more subclasses, but it would have to implement accept
and by calling one of the visit...
methods, so it would either have to behave like Cons
or like Nil
.
1
If you abuse C# named parameters (introduced in C# 4.0), you can make algebraic data types that are easy to match on:
Either<string, string> e = MonthName(2);
// Match with no return value.
e.Match
(
Left: err => { Console.WriteLine("Could not convert month: {0}", err); },
Right: name => { Console.WriteLine("The month is {0}", name); }
);
// Match with a return value.
string monthName =
e.Match
(
Left: err => null,
Right: name => name
);
Console.WriteLine("monthName: {0}", monthName);
Here is the implementation of the Either
class:
public abstract class Either<L, R>
{
// Subclass implementation calls the appropriate continuation.
public abstract T Match<T>(Func<L, T> Left, Func<R, T> Right);
// Convenience wrapper for when the caller doesn't want to return a value
// from the match expression.
public void Match(Action<L> Left, Action<R> Right)
{
this.Match<int>(
Left: x => { Left(x); return 0; },
Right: x => { Right(x); return 0; }
);
}
}
public class Left<L, R> : Either<L, R>
{
L Value {get; set;}
public Left(L Value)
{
this.Value = Value;
}
public override T Match<T>(Func<L, T> Left, Func<R, T> Right)
{
return Left(Value);
}
}
public class Right<L, R> : Either<L, R>
{
R Value { get; set; }
public Right(R Value)
{
this.Value = Value;
}
public override T Match<T>(Func<L, T> Left, Func<R, T> Right)
{
return Right(Value);
}
}
2
In C#, you can’t have that Empty
type, because, due to reification, the base types are different for different member types. You can only have Empty<T>
; not that useful.
In Java, you can have Empty : ConsList
due to type erasure, but I am not sure whether the type checker wouldn’t scream somewhere.
However since both languages have null
, you can think of all their reference types as being “Whatever|Null”. So you’d just use the null
as the “Empty” to avoid having to specify what it derives.
4
The data type ConsList<A>
can be represented as an interface. The interface exposes a single deconstruct
method which allows you to “deconstruct” a value of that type – that is, to handle each of the possible constructors. Calls to a deconstruct
method are analogous to a case of
form in Haskell or ML.
interface ConsList<A> {
<R> R deconstruct(
Function<Unit, R> emptyCase,
Function<Pair<A,ConsList<A>>, R> consCase
);
}
The deconstruct
method takes a “callback” function for each constructor in the ADT. In our case, it takes a function for the empty list case, and another function for the “cons cell” case.
Each callback function accepts as arguments the values which are accepted by the constructor. So the “empty list” case takes no arguments, but the “cons cell” case takes two arguments: the head and the tail of the list.
We can encode these “multiple arguments” using Tuple
classes, or using currying. In this example, I chose to use a simple Pair
class.
The interface is implemented once for each constructor. First, we have the implementation for the “empty list”. The deconstruct
implementation simply calls the emptyCase
callback function.
class ConsListEmpty<A> implements ConsList<A> {
public ConsListEmpty() {}
public <R> R deconstruct(
Function<Unit, R> emptyCase,
Function<Pair<A,ConsList<A>>, R> consCase
) {
return emptyCase.apply(new Unit());
}
}
Then we implement the “cons cell” case similarly. This time the class has properties: the head and tail of the non-empty list. In the deconstruct
implementation, those properties are passed to the consCase
callback function.
class ConsListConsCell<A> implements ConsList<A> {
private A head;
private ConsList<A> tail;
public ConsListCons(A head, ConsList<A> tail) {
this.head = head;
this.tail = tail;
}
public <R> R deconstruct(
Function<Unit, R> emptyCase,
Function<Pair<A,ConsList<A>>, R> consCase
) {
return consCase.apply(new Pair<A,ConsList<A>>(this.head, this.tail));
}
}
Here is an example of using this encoding of ADTs: we can write a reduce
function which is the usual fold over lists.
<T> T reduce(Function<Pair<T,A>,T> reducer, T initial, ConsList<T> l) {
return l.deconstruct(
((unit) -> initial),
((t) -> reduce(reducer, reducer.apply(initial, t.v1), t.v2))
);
}
This is analogous to this implementation in Haskell:
reduce reducer initial l = case l of
Empty -> initial
Cons t_v1 t_v2 -> reduce reducer (reducer initial t_v1) t_v2
1
The only thing needed beyond naive subclassing is a way to seal classes, i.e. a way to make it impossible to add subclasses to a hierarchy.
In Java you can’t. But you can declare the base class as package private, which means that all direct subclasses have to belong to the same package as the base class. If you then declare the subclasses as final, they can’t be subclassed any further.
I don’t know if this would address your real problem though …
2
The only thing needed beyond naive subclassing is a way to seal classes, i.e. a way to make it impossible to add subclasses to a hierarchy.
How would you approach this problem in a language like C# or Java?
There isn’t a good way to do this, but if you’re willing to live with a hideous hack then you can add some explicit type checking to the abstract base class’ constructor. In Java, this would be something like
protected ConsList() {
Class<?> clazz = getClass();
if (clazz != Empty.class && clazz != ConsCell.class) throw new Exception();
}
In C# it’s more complicated because of the reified generics – the simplest approach might be to convert the type to a string and mangle that.
Note that in Java even this mechanism can theoretically be bypassed by someone who really wants to via the serialisation model or sun.misc.Unsafe
.
3