It is stated that dask bags do not preserve order. However, the example given for dast.bag.map
does something that implies that order is preserved, or at least predictable, in https://docs.dask.org/en/stable/generated/dask.bag.map.html
Specifically the example I’m referring to is this:
import dask.bag as db
b = db.from_sequence(range(5), npartitions=2)
b2 = db.from_sequence(range(5, 10), npartitions=2)
from operator import add
db.map(add, b, b2).compute()
[5, 7, 9, 11, 13]
This implies that the two bags are kept aligned. i.e. the second element of b is added to the second element of b2. My question: is this alignement is guaranteed? i.e. would you ever get the second element of b
added to the third element of b2
?
For completeness, I’ll ask a related question. Suppose your graph splits and then recombines. Is the alignment guaranteed to be maintained? As a concrete example, suppose I want to compute x**2 + x**3
for every element in a bag. Would I be garanteed to maintain alignment and get the correct answer for each element of the bag x
the following code?
import dask.bag as db
x = db.from_sequence(range(5), npartitions=2)
from operator import add, pow
x2 = db.map(pow, x, 2)
x3 = db.map(pow, x, 3)
x2_plus_x3 = db.map(add, x2, x3).compute()