When map-reduce divides a task and sends the data to each worker process, does it also transmit the instructions for how to operate on the data?
For example, let’s say Google has some huge array of computers that use map-reduce to index webpages. If they wanted to temporarily use that same array of systems to do something else, would they have to reconfigure each system manually, or does map-reduce handle transmitting the new executable instructions to the workers?
3
If you want to set everything up manually, you can do that. At places like Google, though, you would expect some kind of highly automated infrastructure.
Generally, for purposes of portability and long-term maintenance, it is best to make the description of your map-reduce as independent as possible from these implementation details. This is, in fact, the main virtue of map-reduce: it allows you to describe a parallellizable task independently of the details of the system it runs on.