Say I have an array of runners with which I need to find the tallest runner, the fastest runner, and the lightest runner. It seems like the most readable solution would be:
runners = getRunners();
tallestRunner = getTallestRunner(runners);
fastestRunner = getFastestRunner(runners);
lightestRunner = getLightestRunner(runners);
..where each function iterates over the runners and keeps track of the largest height, greatest speed, and lowest weight. Iterating over the array three times, however, doesn’t seem like a very good idea. It would instead be better to do:
int greatestHeght, greatestSpeed, leastWeight;
Runner tallestRunner, fastestRunner, lightestRunner;
for(runner in runners){
if(runner.height > greatestHeight) { greatestHeight = runner.height; tallestRunner = runner; }
if(runner.speed > ...
}
While this isn’t too unreadable, it can get messy when there is more logic for each piece of information being extracted in the iteration.
What’s the middle ground here? How can I use only a single iteration while still keeping the code divided into logical units?
1
Taskinoor has the right idea, but there’s a better way to implement it… provided your language supports passing a function reference around.
For example, here’s how I’d do it in C#-ish style:
// Define three anonymous functions which take two Runners, compare them, and return one.
Func<Runner, Runner, Runner> tallest = (x,y) => x.height > y.height ? x : y;
Func<Runner, Runner, Runner> fastest = (x,y) => x.speed > y.speed ? x : y;
Func<Runner, Runner, Runner> lightest = (x,y) => x.weight < y.weight ? x : y;
// Default everything to the first runner, to keep things simple
Runner tallestRunner = fastestRunner = lightestRunner = runners.First();
// Loop
foreach(runner in runners){
tallestRunner = tallest(tallestRunner, runner);
fastestRunner = fastest(fastestRunner, runner);
lightestRunner = lightest(lightestRunner, runner);
}
This is trivial to expand – rather than defining three functions, you can define an array of Func<Runner, Runner, Runner>
anonymous functions, and just run them all. You can even do it with regular functions like Runner pickTallest(Runner x, Runner y)
, although then you need to explicitly define them. The key, though, is that you don’t have to actually track the value for each stat – you just need to know how to compare two Runners
and choose the one with the better value.
This is one of those things where you quite often want to operate on a chunk of data, rather than the OO principle of “a single piece of data”.
So I’d wrap the entire list in a class which on creation, parses the list and calculates out what you want. You’d also use this class to insert into, and remove from the list, so that the wrapped information is always up to date.
class Runners
{
public Runners( RunnerList inputRunners)
{
runners = inputRunners;
Recalculate();
}
private Recalculate()
{
foreach( Runner runner in runners )
{
// comparisons here!
}
}
public Insert(Runner newRunner)
{
int index = runners.Add(newRunner);
if( newRunner.height > runners[highestIndex].height)
{
highestIndex = index;
}
// and so on.
}
public Remove(Runner delRunner)
{
runners.Remove(delRunner);
Recalculate();
}
// accessors
Runner GetHighest() { return runners[highestIndex]; }
Runner GetFastest() { return runners[fastestIndex]; }
Runner GetLightest() { return runners[lightestIndex]; }
RunnerList runners; // list of runners we manage
int highestIndex; // index of element in list which is highest.
int fastestIndex; // index of element in list which is fastest
int lightestIndex; // you get the idea right?
}
Thats it. you’ve now got a self contained block of logic which answers your questions for you with only a single iterate on creation, and when you remove objects.
You can return all three values in one go. In pseudo code close to Scala:
val (fastest, tallest, lightest) = runners
.foldLeft(...)(((tallestSoFar, fastestSoFar, lightestSoFar),runner) =>
(tallest(runner, tallestSoFar),
fastest(runner, fastestSoFar),
lightest(runner, lightestSoFar))
That will give you a tuple of the runners you’re looking for. You can do the same in other languages. You might have to grab a library like Guava or Underscore to do it, and you might have to wrap the tuple in an object.
The problem you are describing is very common: You have a loop that produces certain data, and you want to aggregate multiple “statistics” from the overall data. The straightforward implementation is, as you said, to have multiple local variables which are each updated in every iteration:
int greatestHeight, greatestSpeed, leastWeight;
Runner tallestRunner, fastestRunner, lightestRunner;
for(...){
runner = ... // the runner comes from somewhere, not necessarily a list
if(runner.height > greatestHeight) { greatestHeight = runner.height; tallestRunner = runner; }
if(runner.speed > ...
}
This doesn’t have a good separation of concerns, because you have the aggregation logic inline with the data production. Extracting the aggregation logic into a method (as proposed in this answer) is an improvement, but the isolation is still not good: the intermediate results and (if needed) helper variables like greatestHeight
still have to be local variables.
So IMHO the only good solution is to extract both the aggregation logic and the assignments into a method.
How can this be done? By first refactoring the local variables to fields. Then, you can e.g. extract a method updateStats(runner)
which updates the tallestRunner
/fastestRunner
/… fields and corresponding greatestHeight
/greatestSpeed
/… fields.
But doesn’t this make isolation worse? Yes, at first, but this can be fixed by an extract class refactoring: Move the the statistics fields and the updateStats
method to a new (e.g. nested) class. So in the end you’ll have the following readable, single-pass solution:
RunnerStats stats = new RunnerStats();
for(...){
runner = ...
stats.updateStats(runner);
}
... = stats.tallestRunner;
}
static class RunnerStats{
int greatestHeight, greatestSpeed, leastWeight = Integer.MAX_VALUE;
Runner tallestRunner, fastestRunner, lightestRunner;
void updateStats(Runner runner){
updateTallest(runner);
update...
}
void updateTallest(Runner runner){
if(runner.height > greatestHeight) { greatestHeight = runner.height; tallestRunner = runner; }
}
}
Create a RunnerStats class that calculates the stats in a single for loop
, just as you do in your second code snippet.
Then you read the stats into the variables via getters
. The getters only return the already calculated values, they don’t calculate anything.
That way you get the best of both solutions: efficiency and legibility.
runners = getRunners();
RunnerStats rStats = new RunnerStats(runners);
tallest = rStats.getTallets();
fastest = rStats.getFastest();
lightest = rStats.getTallest();