I’m currently learning Clojure (my first functional programming language), and I’m curious as to its order of evaluation.
Here is an example:
(take 10 (cycle [1 2 3]))
If the cycle
expression was not wrapped in a take
expression, this would cause a memory error, as it would iterate infinitely through the vector. So, how does Clojure know that it has been wrapped with a take
expression?
(If I’m using any incorrect terminology here, I’d appreciate a nudge too!)
1
So, how does Clojure know that it has been wrapped with a take expression?
It doesn’t. Lazyness is better understood like this: nothing is ever evaluated unless it’s absolutely necessary. The cycle
function has the semantics of “repeat things endlessly”, but this doesn’t actually happen until the interpreter has to prove to you, the user, that it does. If the sequence only had to produce a value, even an infinite one, it would do nothing at all, and there would be no way for you to notice. What does start the actual computation is the need to deliver something to the caller, in this case take 10
. It has to know what elements to operate on, so the cycle
function reluctantly produces 10 values, but no more.
The take
function, in turn, only produces a value. If it were alone in the world, it might just as well not bother to do anything at all, but it’s called by the REPL, which must print values onto the terminal per its contract. Ultimately, the need to print the first character of the result is what drives the entire computation. This is exactly the opposite of the traditional approach where an inner expression is evaluated to completion and then passed on to the next level.
4
Your issue here seems to be specifically around Clojure’s sequences which can be infinite.
Clojure’s sequences can either be null or implement an interface called ISeq, which among other things has 2 particularly important functions, first
and next
.
Consider the following java class
public class LongsFrom implements ISeq{
private Long _currentNumber;
public LongsFrom(Long start){
_currentNumber = start
}
public Object first(){
return _currentNumber;
}
public ISeq next(){
return new LongsFrom(_currentNumber + 1);
}
}
It represents an infinite sequence of increasing numbers starting wherever you want. As you keep calling next() on it, you keep realizing more and more of the sequence, however all you really need to keep in memory is whatever the latest object in the sequence is.
What cycle does is create an instance of a class which implements the ISeq interface (it knows how to calculate the sequence but doesn’t actually do it until you start calling next). Then your take
command actually starts calling first
and rest
on this class to get however much of the sequence it needs.
A class which actually performs cycle
would only be slightly more complicated than my LongsFrom example. The cycle class could have 2 private variables, one which holds the list of things it needs to cycle through and then an integer to remember which index it’s currently on. first would return the item at the current index. next would return a new cycle object with the exact same internal list, but an index of 1 + currentIndex (unless we’re at the end of the list in which case we start the index back at 0)
The reason why you have memory issues if you don’t have the take
command is that if you don’t have the take
command, the repl tries to print
the sequence and the default implementation for print
is to keep calling first
and rest
until the sequence ends (which is never in the case of cycle
)