In our software environment, we often run a/b tests, as is probably good practice. However, our environment is set up such that, in very short order, the code starts to become very crufty with dead tests. The testing registry is little more than a collection of internal wiki pages.
I thought of a “dead man’s switch” style of defunct code management. If you’re not familiar with the term, it refers to a switch that must be reset periodically in order to keep something from triggering — in essence, if you don’t responde, the switch triggers, and whatever you wanted the switch to trigger is performed.
For example, I would write some code, register it with this system in some way, and when a date of my predetermined choosing rolled around, I would get a notification that this code would be removed (automatically cleaned up) unless I intervened (manually clean up, or snooze).
What are the pros, cons and viability of incorporating such a system? Is it possible or wise? What might be some alternative ways to manage code against rotting?
1
I’d be worried about a system that automatically removes code. Unless your team is very well diciplined, I can only see this as a road to tears and pain. Things happen: people go on vacation, get sick, leave the company, forget what the code that’s about to expire does, … and having code be automatically cleaned up sounds like it will invite all kinds of trouble. You’d have to ensure that there are no compiler dependencies between these modules, or they could suddenly fail to compile when something they depend on gets expired out of existence. And don’t forget what would happen if someone in production support is working on a older trouble ticket and the stack trace in the error message references code that was deleted. And this would not make people who do auditing very happen either, I imagine…
Something that might be safer is a framework where your little modules of code can query for an activation date and a deactivation date. When the piece of code is about to run, it checks in a database for what is dates are and if the current date is between the activation and deactivation date, your framework allows the code to run. And if you store these dates in a database, you could easily write a script that will generate a report of all modules expiring in the next n days, and email it to you so it’s the first thing in your inbox on Monday morning.
2
This system creates a problem of orphaned tests: if someone who wrote a suite of tests and a set of production code associated with it departs the company, the tests are in danger of being removed prematurely by omission of their new owner.
I do not think that “too many tests” is a problem: as long as tests are automated, your waste is limited mostly to CPU hours, which is small change compared to man hours. A test removed automatically through the dead man’s switch could catch bugs that would otherwise end up in production, causing severe maintenance problems down the road.
I think you can build a process-based alternative by creating a registry pairing up test suites with modules of your code. Every time a maintenance is performed on a module, the process would require making a decision on keeping/removing/updating the corresponding test suite. Since this is a pure bookkeeping, there is no danger of code being removed from the list automatically.
3
I have long used simple calendar alerts (whatever calendar software your company uses should suffice) where I simply set it to alert me at whatever time and I put all the information I’ll need presume I forgot everything by the time the alert goes off.
Install an ssl cert? Look at the expiration date, set a calendar event for myself (and a manager and another engineer in case one or 2 of the 3 of us happen to no longer be with the company, which has happened) 2 months before that to start sounding the alarm to whomever that it need be replaced, put in the event the contact information for the people we received the cert from, the specific places the cert is used and details of systemic ramifications of letting it expire (maybe it’s a dead system and the cert should expire when that time comes around).
Automatic code modification is a downright terrible, dangerous idea. All you need to do is judiciously use your calendar to make sure you are notified of time sensitive things when their time becomes relevant. This is specifically what calendars are for. Now if you have a problem of not heeding time sensitive events in a responsible manner, I think you have an altogether different problem that you might want to ask on the workplace about.
Sounds like a very bad idea, because later code will depend on those changes. Automatic rollback will cause widespread damage.
Instead, write your tests as automated (unit) tests, and run them nightly.
4