I’m learning Scala and am a little bewildered by all the methods (higher-order functions) available on the collections. Which ones produce more results than the original collection, which ones produce less, and which are most appropriate for a given problem? Though I’m studying Scala, I think this would pertain to most modern functional languages (Clojure, Haskell) and also to Java 8 which introduces these methods on Java collections.
Specifically, right now I’m wondering about map with filter vs. fold/reduce. I was delighted that using foldRight() can yield the same result as a map(…).filter(…) with only one traversal of the underlying collection. But a friend pointed out that foldRight() may force sequential processing while map() is friendlier to being processed by multiple processors in parallel. Maybe this is why mapReduce() is so popular?
More generally, I’m still sometimes surprised when I chain several of these methods together to get back a List(List()) or to pass a List(List()) and get back just a List(). For instance, when would I use:
collection.map(a => a.map(b => ...))
vs.
collection.map(a => ...).map(b => ...)
The for/yield command does nothing to help this confusion. Am I asking about the difference between a “fold” and “unfold” operation?
Am I trying to jam too many questions into one? I think there may be an underlying concept that, if I understood it, might answer all these questions, or at least tie the answers together.
You can roughly emulate map/filter by using foldRight (or foldLeft) with cons
as the given reduction function. For example foldRight(f, L, initial)
where L = [x,y,z]
might be expanded to
f(x, f(y, f(z, initial)))
This means that you can’t process x
until you’ve processed f(z, initial)
and then f(y, ...)
. You’re creating a dependency that doesn’t exist with map/filter.
As for map ordering…
(A) collection.map(a => a.map(b => ...))
(A) takes a collection and then applies the map function to each element, this implies that each element is a “mappable” collection. This inner map will then return the processed list (which is a list) and so each element of collection
will remain a collection. This is how you would map a function onto each element of a list of lists, and the result would again be a list of lists.
(B) collection.map(a => ...).map(b => ...)
(B) processes each element of a list and forms those results into a new list. This list is then processed again with a second map function giving yet another list.
(A) is for processing the inner elements of a list (“sub-elements” if you like). (B) is for processing a list multiple times. If we write (B) concretely as
collection.map(a => f(a)).map(b => g(b))
we can see it is equivalent to
collection.map(a => g(f(a)))
It might help you to write them out as for loops. (A) will use embedded loops where as (B) will use two sequential loops.
This is not the difference between fold and unfold. Neither (A) nor (B) is a fold as the list structure present, however deeply nested, is preserved. Fold creates a scalar from a list, while unfold (not too familiar with it) takes a scalar and produces a list according to some rule.
EDIT: @Giorgio was right to suggest flatMap. It’s an interesting variation on map. Let’s say we’re mapping a list of Xs to a list of Ys, so we pass map a function f:X->Y
. Suppose we have another calculation g
that takes an X but returns multiple Ys g:X->[Y]
. In this case we would use flatMap
. map
takes the results of f
and puts them into a list. flatMap
takes the results of g
and concatenates them together.
Ex. Say we have a list of lists L = [[1,2,3],[4,5,6]]
L.map(a => a.map(b => b * 2))
gives [[2,4,6],[8,10,12]]
. But say we want each number doubled in one list, no sub-lists. Then we would call
L.flatMap(a => a.map(b => b * 2))
which gives [2,4,6,8,10,12]
. Note that our inner function a => a.map(b => b * 2)
takes and returns a list.
6
Fold is more general than map, reduce, and filter. If you take a minute to think about it, their implementation in terms of a fold should be simple. Consider using them first because it conveys more intention and limits what could happen. A map will always produce a list of the same length, filter won’t change values just possibly remove some, reduce turns a list in to a single value, with a fold all three could happen at the same time.
If you go from a List to a List of Lists, then a map(or a fold doing map like things) is returning a list somewhere. List of Lists to a List means a reduce is coming in to play somewhere.
I would also look on stack overflow at the discussions concerning them. It should help with use cases for each.
Concerning the operations that produce an output smaller than their input, a common mistake concerns map itself. By intuition, map takes each of the input elements, computes a transformation of this element, and places the result on the output, so that input and output have the same size.
In fact, in Scala, a map of a set (let’s say, of integers) is a set. So applying a predicate (that is, a boolean function) establishing if the integer is odd or even, the result will be a set of booleans. Therefore the only possible output sizes are 0, 1 and 2, whatever the input size is.
val a = Set(1,2,3,4,5)
a.map(_ % 2 == 0)
res0: scala.collection.immutable.Set[Boolean] = Set(false, true)
Note that is a Scala convention. For instance, in Python, a map of a set is a list containing the same number of elements than the original set.