I’m currently refactoring a part of a large codebase with no unit tests whatsoever. I tried to refactor code the brute way, i.e. by trying to guess what the code is doing and what changes wouldn’t change it meaning, but without success: it randomly breaks features all around the codebase.
Note that refactoring includes moving legacy C# code to a more functional style (the legacy code doesn’t use any of the features of .NET Framework 3 and later, including LINQ), adding generics where the code may benefit from them, etc.
I can’t use formal methods, given how much would they cost.
On the other hand, I presume that at least “Any refactored legacy code shall come with unit tests” rule should be strictly followed, no matter how much would it cost. The problem is that when I refactor a tiny part of a 500 LOC private method, adding unit tests appears to be a difficult task.
What can help me in knowing which unit tests are relevant for a given piece of code? I’m guessing that static analysis of the code would somehow be helpful, but what are the tools and techniques I can use to:
-
Know exactly what unit tests should I create,
-
And/or know if the change I’ve done affected the original code in a way that it is executing differently from now?
5
I have had similar challenges. The Working with Legacy Code book is a great resource, but there’s an assumption that you can shoe-horn in unit tests to support your work. Sometimes that’s just not possible.
In my archeology work (my term for maintenance on legacy code like this), I follow a similar approach as to what you outlined.
- Start with a solid understanding of what the routine is currently doing.
- At the same time, identify what the routine was supposed to be doing. Many think this bullet and the previous are the same, but there is a subtle difference. Often times, if the routine was doing what it was supposed to be doing then you wouldn’t be applying maintenance changes.
- Run some samples through routine and make sure you hit the boundary cases, relevant error paths, along with the mainline path. My experience is that the collateral damage (feature breakage) comes from boundary conditions not being implemented in exactly the same way.
- After those sample cases, identify what’s being persisted that doesn’t necessarily need to be persisted. Again, I have found that it’s side-effects like this that lead to collateral damage elsewhere.
At this point, you should have a candidate list of what’s been exposed and / or manipulated by that routine. Some of those manipulations are likely to be inadvertent. Now I use findstr
and the IDE to understand what other areas might reference the items in the candidate list. I’ll spend some time understanding how those references are working and what their nature is.
Finally, once I’ve deluded myself into thinking I understand the impacts of the original routine, I’ll make my changes one-at-a-time and rerun the analysis steps I outlined above to verify that the change is working as I expect it to work. I specifically try to avoid changing multiple things at once as I have found this blows up on me when I try and verify the impact. Sometimes you can get away with multiple changes, but if I can follow a one-at-a-time route, that’s my preference.
In short, my approach is similar to what you laid out. It’s a lot of prep work; then make circumspect, individual changes; and then verify, verify, verify.
1
What would help when refactoring a large method to ensure that I don’t break anything?
Short answer: small steps.
The problem is that when I refactor a tiny part of a 500 LOC private method, adding unit tests appears to be a difficult task.
Consider these steps:
-
Move the implementation into a different (private) function and delegate the call.
// old: private int ugly500loc(int parameters) { // 500 LOC here } // new: private int ugly500loc_old(int parameters) { // 500 LOC here } private void ugly500loc(int parameters) { return ugly500loc_old(parameters); }
-
Add logging code (make sure logging doesn’t fail) in your original function, for all inputs and outputs.
private void ugly500loc(int parameters) { static int call_count = 0; int current = ++call_count; save_to_file(current, parameters); int result = ugly500loc_old(parameters); save_to_file(current, result); // result, any exceptions, etc. return result; }
Run your application and do whatever you can with it (valid use, invalid use, typical use, atypical use, etc).
-
You now have
max(call_count)
sets of inputs and outputs to write your tests with; You could write a single test that iterates over all your parameters/result sets you have and executes them in a loop. You can also write an aditional test that runs a particular combination (to be used to quickly check passing over a particular i/o set). -
Move
// 500 LOC here
back into yourugly500loc
function (and remove logging functionality). -
Start extracting functions from the big function (do nothing else, just extract functions) and run tests. After this you should have more little functions to refactor, instead of the 500LOC one.
-
Live happily ever after.
Usually Unit Tests are the way to go.
Make the needed tests that prove that the current works as expected.
Take your time and the last test must make you confident on the output.
What can help me in knowing which unit tests are relevant for a given
piece of code?
You’re in the process of refactoring a piece of code, you must know exactly what it does and what it impacts. So basically you need to test all the impacted zones.
This will take you a lot of time… but that’s an expected outcome of any refactoring process.
Then you can tear all apart without any problems.
AFAIK, there’s no bullet proof technique for this… you just need to be methodical (on whatever method you feel comfortable on), a lot of time and a lot of patience! 🙂
Cheers and good luck!
Alex
1