I am currently studying about pattern matching in Haskell from here. The author gives an example of the implementation of “head” function (which returns the first element of a list) as following:
head' (x:_) = x
He kind of explains the use of the operator but it’s not clear to me. I understand that we can use it to append elements at front of a list:
5:[6,7] will give [5,6,7]
I can’t see though how we can use the ‘:’ operator as a part of the argument (x:_) or (x:xs) in general.
How does that binding mechanism that binds the first element to x work?
1
It’s a little confusing, because :
does two different things depending on the context. In a regular context, :
does as you describe. It prepends the element on the left with the list on the right.
In a pattern matching context, it does the exact opposite. It’s a completely different operation, even though it uses the same syntax. Behind the scenes, the compiler generates completely different code. In this context, it takes the first element of the list and assigns it to the variable before the :
, and it takes the remainder of the list and assigns it to the variable after the :
. If a constant was there instead of a variable, it checks to make sure the argument matches that constant, instead of doing an assignment.
If the :
appears on the left side of an =
or ->
, you use the pattern matching definition. Otherwise, you use the prepending definition. The syntax is the same so that constructing a list looks the same as deconstructing one.
That’s a lot of overloading happening on that one operator, but when you get used to it, it makes for some very concise code.
1
You’re thinking about this in a very imperative mindset.
(:) does not do anything.
It means that whatever comes before it is the first element of the list.
2:[3, 4]
is [2,3,4]
This is important to consider when looking at the pattern matching.
fun (x:xs) = ...
This is a way of giving the first element and the tail of a list a name.
There’s nothing more to it.
(:)
is not a regular operator; it’s a data constructor. Lists are constructed using (:)
at the base level. This means that patterns can be matched to the input depending on the input. This works with all datatypes with constructors.
Here are some examples, illustrating that all constructors can be used in this way for patterns:
removeJust (Just n) = n
removeJust Nothing = error "Can't remove from Nothing"
head (x:xs) = x
head [] = error "Oh, no!"
tail (x:xs) = xs
tail [] = error "Oh, no!"
In each case here, a constructor is used to extract, or match, values from the data structure in some way. Only the constructor can be used in this pattern match, thus (:)
, Just
and Nothing
are all constructors that can be matched.
In fact, we can make our own List
datatype to illustrate how that works:
data List t = Empty | Link t (List t)
-- Imagine Empty = [], Link 6 Empty = [6] = 6 : []
headList :: List t -> t
headList (Link x xs) = x
headList Empty = error "Oh, no!"
This is essentially the list datatype. Of course, it’s a builtin type, so it has it’s syntactic sugar, but it’s all the same in the end.
1
As others have pointed out, the operator :
is a data constructor that allows to create a list by adding an element in front of an existing list.
The expression (or term) 1 : 2 : []
is actually a tree 1 : (2 : [])
, where
- The root node is labeled with
:
- The root has two children: a leaf labeled with
1
and the subtree2 : []
A term can be see as a data structure that denotes a value: the value is obtained by evaluating / reducing the term:
(1 + 2) + 5 => 3 + 5 => 8
A term can also be used as a pattern against which we want to match values. For example: the term 1 : 2 : []
is a pattern that matches the list [1, 2].
Patterns can contain variables, e.g. x : y
. This is a tree with a root node labeled with :
and two leaves labeled with the variables x
and y
. You can define a substitution for variables, e.g. x --> 1, y --> 2 : []
. With this substitution, the term x : y
can be unified with the term 1 : 2 : []
.
As you probably know already, the special variable _
is used when we are interested to find a substitution, but we are not interested in the actual substitution, i.e. we only want to know that there is one.
When defining a function in Haskell, each function left-hand side contains patterns that need to be matched against the actual arguments. If a substitution is found, then the right-hand side is evaluated, with the formal arguments replaced by the terms in the substitution.
So, a term can be used for pattern matching (e.g. in left-hand side of a function equation) or for denoting a value (e.g. in the right-hand side of an equation). In order for a term in the right-hand side to represent a value, it must
- either contain no free-variable, as in
1 : 2 : []
- or contain only free variables that have been bound by matching the left-hand side with the arguments of the function call.
Summarizing: you use the same syntax for patterns and expressions because in both cases you are building terms, even though these are used in different ways.