The Problem
Imagine you have a function in a public API that needs to return some kind of collection. Its type is something like
foo :: a -> b
where b
is some kind of collection.
As far as I can see, there a three major kinds of collections — at least those are the ones used most often:
- Sequences (order matters, there might be duplicates)
- Sets (order doesn’t matter, no duplicates)
- Maps (elements are key-value pairs, order doesn’t matter, no duplicate keys)
What should you use for b
in these three cases?. I have considered three alternatives, but there might be more.
A. Just return a list
In this alternative, you just return lists for everything:
foo :: a -> [c]
foo :: a -> [c]
(and document that there are no duplicates)foo :: a -> [(k, c)]
(and document that there are no duplicate keys)
The advantage of this is that the client can construct any collection they want from the returned list.
The disadvantage for 2 and 3 is that
- the type does not have the “no duplicates” guarantee and that the client has to do more work to get a set/map out of it
- construction of the list in 2 and 3 might be less efficient, because something like a
union
for sets is obviously faster than for lists - you might construct an intermediate list (depends on the problem at hand).
B. Return a concrete type matching the guarantees
In this alternative, you return a type that precisely matches the guarantees you are giving:
foo :: a -> [c]
(or maybefoo :: a -> Data.Sequence.Seq c
)foo :: a -> Data.Set.Set c
(lazy or strict?)foo :: a -> Data.Map.Map k c
(lazy or strict?)
The advantage is that the type accurately describes the guarantees you want to express and that the client doesn’t have to do any extra work.
The disadvantage is that you’re locked into a particular implementation. If you just want to switch to a different set implementation, you break the public API. That might be a good thing in case of a switch between lazy and strict, because those can matter for correctness of the calling code. But in general you want to be able to swap implementation without your clients noticing.
C. Let the client decide
In this alternative, you just constrain the return type with a type class and let the client decide on the concrete type:
1. `foo :: Sequence b c => a -> b`
2. `foo :: Set b c => a -> b`
3. `foo :: Map b c k => a -> b`
(The names I used are from collections-api, but it doesn’t really matter.)
The advantage of this is that the client can choose which collection type to use.
The disadvantage is that you have no control over the collection you construct anymore. This means, e.g., that you cannot assume laziness or complexity of inserts.
Another problem is that the client can get ambiguous types, e.g., when they immediately fold the returned collection:
Data.Foldable.fold (foo whatever)
Now b
is ambiguous.
Are there any other alternatives? What would you recommend?
I don’t want foo
to be able to return a different kind of collection on every call.
The three cases considered are completely separate.
In each case the question is “if I have a function that needs to return an X, what type should it have?” where X can be a sequence, set, or map (or any other kind of collection you can think of).
4
My suggestion would be to declare an opaque newtype
that hides the actual implementation and implement type classes that could be useful to your clients and that you can guarantee to be stable in the future. And you can also export functions that allow conversion of your data type to some standard ones.
This forces clients to use only functions or instances you provide and document, and also allows clients to use standard API such as Foldable
:
module MyMod
( MyResult()
, resultToSet -- by hiding the internals and exporting the accessor
-- explicitly, you can always change the internal
-- representation and replace the accessor a function
-- with the same signature
) where
import Data.Monoid
import qualified Data.Foldable as F
import qualified Data.Set as S
newtype MyResult a = MyResult { resultToSet :: S.Set a }
-- For example, your clients could use `Foldable` and `Monoid` instances:
instance F.Foldable MyResult where
foldMap f = F.foldMap f . resultToSet
instance (Ord a) => Monoid (MyResult a) where
mempty = MyResult mempty
mappend (MyResult x) (MyResult y) = MyResult (x `mappend` y)
Update: If you later decide to change the internal representation to HashSet
, the module would look like this:
module MyMod
( MyResult()
, resultToSet -- by hiding the internals and exporting the accessor
-- explicitly, you can always change the internal
-- representation and replace the accessor a function
-- with the same signature
) where
import Data.Monoid
import qualified Data.Foldable as F
import Data.Hashable
import qualified Data.HashSet as HS
import qualified Data.Set as S
newtype MyResult a = MyResult { resultToHashSet :: HS.HashSet a }
-- For example, your clients could use `Foldable` and `Monoid` instances:
instance F.Foldable MyResult where
foldMap f = F.foldMap f . resultToHashSet
instance (Eq a, Hashable a) => Monoid (MyResult a) where
mempty = MyResult mempty
mappend (MyResult x) (MyResult y) = MyResult (x `mappend` y)
-- Reimplementation of functions from the previous version
resultToSet :: (Ord a) => MyResult a -> S.Set a
resultToSet = S.fromList . HS.toList . resultToHashSet
The external interface is almost the same (the only change is that the Monoid
instance now has slightly different constraints).
Another way that would give clients much flexibility is to let the collection be fully abstract. This can be nicely expressed using monoids. The output of your function will be a polymorphic monoid, we just need a way how to create singletons. For example, if we want to generalize enumFromTo
, we could write
enumCol :: (Enum a, Monoid c) => a -> a -> ((a -> c) -> c)
enumCol f t singleton = mconcat . map singleton $ enumFromTo f t
It’s then the client responsibility to use a monoid with effective mappend
operation.
With RankNTypes
the return type could be also encapsulated using type
as
type OutCol e c = (Monoid c) => (e -> c) -> c
The additional benefit is that in some cases this can be very effective thanks to laziness. For example, the following returns immediately:
enumCol 0 (10^20) (First . Just)
5
Why not just have three (or N) separate functions, one for each of the three (or N) separate types of collections?
2