Lately I’ve been trying to split long methods into several short ones.
For example: I have a process_url()
function which splits URLs into components and then assigns them to some objects via their methods. Instead of implementing all this in one function, I only prepare the URL for splitting in process_url()
, and then pass it over to process_components()
function, which then passes the components to assign_components()
function.
At first, this seemed to improve readability, because instead of huge ‘God’ methods and functions I had smaller ones with more descriptive names.
However, looking through some code I’ve written that way, I’ve found that I now have no idea whether these smaller functions are called by any other functions or methods.
Continuing previous example: someone looking at the code might think that process_components()
functionality is abstracted into a function because it’s called by various methods and functions, when in fact it’s only called by process_url()
.
This seems somewhat wrong. The alternative is to still write long methods and functions, but indicate their sections with comments.
Is the function-splitting technique I described wrong? What is the preferred way of managing large functions and methods?
UPDATE: My main concern is that abstracting code into a function might imply that it could be called by multiple other functions.
SEE ALSO: discussions on reddit at /r/programming (provides a different perspective rather than most of the answers here) and /r/readablecode.
14
Testing code that does lots of things is difficult.
Debugging code that does lots of things is difficult.
The solution to both of these problems is to write code that doesn’t do lots of things. Write each function so that it does one thing and only one thing. This makes them easy to test with a unit test (one doesn’t need umpteen dozen unit tests).
A co-worker of mine has the phrase he uses when judging if a given method needs to be broken up into smaller ones:
If, when describing the activity of the code to another programmer you use the word ‘and’, the method needs to be split into at least one more part.
You wrote:
I have a process_url() function which splits URLs into components and then assigns them to some objects via their methods.
This should be at least two methods. It is ok to wrap them in one publicly facing method, but the workings should be two different methods.
13
Yes, splitting long functions is normal. This is a way of doing things that’s encouraged by Robert C. Martin in his book Clean Code. Particularly, you should be choosing very descriptive names for your functions, as a form of self-documenting code.
13
As people pointed, this improves readability. A person reading process_url()
may see more clearly what is the general process to deal with URLs just by reading a few method names.
The problem is that other people may think those functions are used by some other piece of the code, and if some of them need to be changed they may choose to preserve those functions and define new ones. This means some code becomes unreachable.
There are several ways to deal with this. First is documentation and comments in the code. Second, tools that provide coverage tests. In any case, to a great extent this depends on the programming language, these are some of the solutions you can apply depending on the programming language:
- object oriented languages can allow to define some private methods, to ensure they are not used elsewhere
- modules in other languages may specify which functions are visible from the outside, etc.
- very high level languages like Python may eliminate the need to define several functions because they would anyway be simple one liners
- other languages like Prolog may require (or strongly suggest) the definition of a new predicate for every conditional jump.
- in some cases it’s common to define auxiliar functions inside the function that uses them (local functions), sometimes these are anonymous functions (code closures), this is common in Javascript callback functions.
So in short, splitting in several functions is usually a good idea in terms of readability. It may not be really good if the functions are very short and this creates the goto
effect or if the names are not really descriptive, in that case reading some code would require jumping among functions, which may be messy. About your concern about scope and usage of these functions, there are several ways to deal with it that are in general language dependent.
In general the best advice is to use common sense. Any strict rule is very likely to be wrong in some case and in the end it depends on the person. I would consider this readable:
process_url = lambda url: dict(re.findall('([^?=&]*)=([^?=&]*)', url))
Personally I prefer a one liner even if it is slightly complex rather than jumping and searching across several files of code, if it takes me more than three seconds to find some other part of code I may not even know what was I checking anyway. People who do not suffer from ADHD may prefer more explanatory names that they can remember, but in the end what you are always doing is balancing the complexity in the different levels of the code, lines, paragraphs, functions, files, modules, etc.
So the keyword is balance. A function with one thousand lines is a hell for anyone reading it, because there is no encapsulation and the context becomes just too huge. A function split into one thousand functions each one of them with one line in it may be worse:
- you have some names (that you could have provided as comments in the lines)
- you are (hopefully) eliminating global variables and do not need to worry about the state (having referential transparency)
- but you force readers to jump back and forth.
So there are no silver bullets here, but experience and balance. IMHO the best way to learn how to do this is reading a lot of code written by other people an analyzing why is it hard for you to read it and how to make it more readable. That would provide valuable experience.
4
I’d say it depends.
If you’re just splitting it for the sake of splitting and calling them names like process_url_partN
and so on, then NO, please don’t. It just makes it harder to follow later when you or someone else needs to figure out what is going on.
If you’re pulling out methods with clear purposes that can be tested by themselves and makes sense on their own (even if nobody else are using them) then YES.
For your particular purpose it seems you have two goals.
- Parse a URL and return a list of its components.
- Do something with those components.
I’d write the first part separate and have it return a fairly general result that could be easily tested and potentially be reused later. Even better, I’d look for a built-in function that already does this in your language/framework and use that instead unless your parsing is super special. If it’s super special I’d still write it as a separate method, but probably “bundle” it as a private/protected method in the class that handles the second (if you’re writing object oriented code).
The second part I’d write as its own component which uses the first for the URL parsing.
0
I’ve never taken issue with other developers splitting larger methods into smaller methods as it’s a pattern that I follow myself. The “God” method is a terrible trap to fall into and others who are less experienced or simply don’t care tend to get caught more often than not. That being said…
It’s incredibly important to use appropriate access identifiers on the smaller methods. It’s frustrating to find a class littered with small public methods because then I totally lose confidence finding where/how the method is used throughout the application.
I live in C#-land so we have public
, private
, protected
, internal
, and seeing those words shows me beyond a shadow of a doubt the scope of the method and where I must look for calls. If it’s private, I know the method is used in only one class and I have full confidence when refactoring.
In the Visual Studio world, having multiple solutions (.sln
) exacerbates this anti-pattern because IDE/Resharper “Find Usages” helpers will not find usages outside of the open solution.
If your programming language supports it, you might be able to define your “helper” functions within the scope of your process_url()
function to get the readability benefits of separate functions. e.g.
function process_url(url) {
function foo(a) {
// ...
}
function bar(a) {
// ...
}
return [ foo(url), bar(url) ];
}
If your programming language doesn’t support this, you might move foo()
and bar()
out of the scope of process_url()
(so that it is visible to other functions/methods)–but consider this a “hack” you’ve put in place because your programming language doesn’t support this feature.
Whether to break a function into sub-functions will probably depend on whether there are meaningful/useful names for the parts and how large each of the functions are, among other considerations.
4
If anyone is interested in some literature on this question: This is exactly what Joshua Kerievsky refers to as “Compose Method” in his “Refactoring to Patterns” (Addison-Wesley):
Transform the logic into a small number of intention-revealing steps at the same level of detail.
I believe the correct nesting of methods according to their “detail level” is important here.
See an excerpt on the publisher’s site:
Much of the code we write doesn’t start out being simple. To make it simple, we must reflect on what isn’t simple about it and continually ask, “How could it be simpler?” We can often simplify code by considering a completely different solution. The refactorings in this chapter present different solutions for simplifying methods, state transitions, and tree structures.
Compose Method (123) is about producing methods that efficiently communicate what they do and how they do what they do. A Composed Method [Beck, SBPP] consists of calls to well-named methods that are all at the same level of detail. If you want to keep your system simple, endeavor to apply Compose Method (123) everywhere…
Addendum: Kent Beck (“Implementation Patterns”) refers to it as “Composed Method”. He advises you to:
[c]ompose methods out of calls to other methods, each of which is at roughly the
same level of abstraction.
One of the signs of a poorly composed method is a mixture of abstraction
levels[.]
There, again, the warning not to mix different abstraction levels (emphasis mine).
2
I’m sure this won’t be the popular opinion, but it’s perfectly ok. Locality of Reference can be a huge aid in making sure you and others understand the function (in this case I’m referring to the code and not to memory specifically).
As with everything, it’s a balance. You should be more concerned with anyone who tells you ‘always’ or ‘never’.
2
A good rule is to have nearby abstractions on similar levels (better formulated by sebastian in this answer just above.)
I.e. if you have a (big) function that deals with low-level stuff, but make some higher level choices too, try to factor out the low-level stuff:
void foo() {
if(x) {
y = doXformanysmallthings();
}
z = doYforthings(y);
if (z != y && isFullmoon()) {
launchSpacerocket()
}
}
Moving stuff out to smaller functions is usually better than having a lot of loops and such inside a function that concists of a few conceptual “big” steps. (Unless you can combine that into relatively small LINQ/foreach/lambda expressions…)
If you could design a class that is appropriate for these functions make them private. Put another way, with a suitable class definition you can expose the only what you need to expose.
Consider this simple function (I’m using Scala-like syntax but I hope the idea will be clear without any knowledge of Scala):
def myFun ... {
...
if (condition1) {
...
} else {
...
}
if (condition2) {
...
} else {
...
}
if (condition3) {
...
} else {
...
}
// rest
...
}
This function has up to 8 possible paths how your code can be executed, depending on how those condition evaluate.
- This means that you’ll need 8 different tests just to test this part. Moreover, most likely some combinations will not be possible, and then you’ll have to carefully analyze what are they (and be sure not to miss some that are possible) – a lot of work.
- It is very hard to reason about the code and its correctness. Since each of the
if
blocks and its condition can depend on some shared local variables, in order to know what’s happening after them, everybody working with the code has to analyze all those code blocks and the 8 possible execution paths. It’s very easy to make a mistake here and most likely somebody updating the code will miss something and introduce a bug.
On the other hand, if you structure the computation as
def myFun ... {
...
val result1 = myHelper1(...);
val result2 = myHelper2(...);
val result3 = myHelper3(...);
// rest
...
}
private def myHelper1(/* arguments */): SomeResult = {
if (condition1) {
...
} else {
...
}
}
// Similarly myHelper2 and myHelper3
you can:
- Easily test each of the helper functions, each of them has only two possible paths of execution.
- When examining
myFun
it is immediately obvious ifresult2
depends onresult1
(just checking if the call tomyHelper2(...)
uses it to compute one of the arguments. (Assuming that the helpers don’t use some global state.) It is also obvious how they’re dependent, something that is much harder to understand in the previous case. Moreover, after the three calls, it’s also clear how the state of the computation looks – it’s captured just inresult1
,result2
andresult3
– no need to check if/what other local variables have been modified.
The more concrete responsability a method has, more easy to test, read and maintain will be the code. Although no other calls them.
If in the future, you need to use this functionality from other places, you can then easy extract those methods.
It is perfectly acceptable to name your methods like this:
MyProcedure()
MyProcedure_part1()
MyProcedure_part2()
end;
I don’t see why anyone would think that these would be called by some other process, and it is perfectly clear what they are for.
5