I tried to explain to a coworker the gravity of having duplicate code in a project, on this piece of code:
+ (void)createIapInParse:(SKPaymentTransaction *)transaction {
Reachability *reach = [Reachability reachabilityWithHostname:@"www.facebook.com"];
if ([Social getFBUser]) {
NSString *iapId = [Util getBundleNameFromIdentifier:transaction.payment.productIdentifier];
PFObject *iap = [PFObject objectWithClassName:@"Iap"];
iap[@"iapId"] = iapId == nil ? [NSNull null] : iapId;
iap[@"userId"] = [Social getFBUser].objectID == nil ? [NSNull null] : [Social getFBUser].objectID;
iap[@"email"] = [Social getFBUser][@"email"] == nil ? [NSNull null] : [Social getFBUser][@"email"];
iap[@"country"] = [Util getDeviceCountry];
iap[@"installationId"] = [Util getInstallationId];
iap[@"score"] = [NSNumber numberWithLong:[GameScore getTotalScore]];
NSTimeInterval interval = [[NSUserDefaults standardUserDefaults] doubleForKey:@"timeSpentInApp"];
iap[@"timeSpentInApp"] = [NSNumber numberWithDouble:interval];
iap[@"retries"] = @0;
iap[@"transactionIdentifier"] = transaction.transactionIdentifier;
iap[@"transactionDate"] = transaction.transactionDate;
iap[@"transactionSource3G"] = [NSNumber numberWithBool:[reach isReachableViaWWAN]];
[iap saveInBackgroundWithBlock:^(BOOL succeded, NSError *error) {
NSString *query =[NSString stringWithFormat: @"...", iap.objectId, transaction.payment.productIdentifier];
[ZeeSQLiteHelper executeQuery:query];
}];
NSLog(@"Save in parse: %@", iap);
} else {
NSString *iapId = [Util getBundleNameFromIdentifier:transaction.payment.productIdentifier];
PFObject *iap = [PFObject objectWithClassName:@"Iap"];
iap[@"iapId"] = iapId == nil ? [NSNull null] : iapId;
iap[@"userId"] = @"-";
iap[@"email"] = @"-";
iap[@"country"] = [Util getDeviceCountry];
iap[@"installationId"] = [Util getInstallationId];
iap[@"score"] = [NSNumber numberWithLong:[GameScore getTotalScore]];
NSTimeInterval interval = [[NSUserDefaults standardUserDefaults] doubleForKey:@"timeSpentInApp"];
iap[@"timeSpentInApp"] = [NSNumber numberWithDouble:interval];
iap[@"retries"] = @0;
iap[@"transactionIdentifier"] = transaction.transactionIdentifier;
iap[@"transactionDate"] = transaction.transactionDate;
iap[@"transactionSource3G"] = [NSNumber numberWithBool:[reach isReachableViaWWAN]];
[iap saveInBackgroundWithBlock:^(BOOL succeded, NSError *error) {
NSString *query =[NSString stringWithFormat: @"...", iap.objectId, transaction.payment.productIdentifier];
[ZeeSQLiteHelper executeQuery:query];
}];
NSLog(@"Save in parse: %@", iap);
}
}
I refactored it to:
+ (void)createIapInParse:(SKPaymentTransaction *)transaction {
Reachability *reach = [Reachability reachabilityWithHostname:@"www.facebook.com"];
NSString *iapId = [Util getBundleNameFromIdentifier:transaction.payment.productIdentifier];
PFObject *iap = [PFObject objectWithClassName:@"Iap"];
NSTimeInterval interval = [[NSUserDefaults standardUserDefaults] doubleForKey:@"timeSpentInApp"];
iap[@"iapId"] = iapId == nil ? [NSNull null] : iapId;
iap[@"country"] = [Util getDeviceCountry];
iap[@"installationId"] = [Util getInstallationId];
iap[@"score"] = [NSNumber numberWithLong:[GameScore getTotalScore]];
iap[@"timeSpentInApp"] = [NSNumber numberWithDouble:interval];
iap[@"retries"] = @0;
iap[@"transactionIdentifier"] = transaction.transactionIdentifier;
iap[@"transactionDate"] = transaction.transactionDate;
iap[@"transactionSource3G"] = [NSNumber numberWithBool:[reach isReachableViaWWAN]];
if ([Social getFBUser]) {
iap[@"userId"] = [Social getFBUser].objectID == nil ? [NSNull null] : [Social getFBUser].objectID;
iap[@"email"] = [Social getFBUser][@"email"] == nil ? [NSNull null] : [Social getFBUser][@"email"];
} else {
iap[@"userId"] = @"-";
iap[@"email"] = @"-";
}
[iap saveInBackgroundWithBlock:^(BOOL succeded, NSError *error) {
NSString *query =[NSString stringWithFormat: @"...", iap.objectId, transaction.payment.productIdentifier];
[ZeeSQLiteHelper executeQuery:query];
}];
NSLog(@"Saved in parse: %@", iap);
}
and he kept arguing with me that it’s the same thing.
His primary argument was that “he’s a good programmer and he can understand and read even the first version fast enough” so he doesn’t care if it’s written like that or not.
The question is, am I really wrong? Is this not so important for everyone? It’s a subjective matter or he doesn’t understand that’s not the same thing?
7
You’re not wrong. It’s just psychologically very difficult to convince people of their own limitations.
The reason we have invented maxims, guidelines etc. that restrict what we should do is that we have found, over time, that behaving in a particular way leads to more success. Importantly, it will lead to more success even if it doesn’t seem so to us at the moment.
If you think about it for a while, you’ll realize that this is so for most if not all maxims. Correct things that are obviously correct are easy to get right. There are no maxims about not using 500-letter variable names because virtually everybody can see immediately that they would be really hard to use. But duplicated code doesn’t seem to have any negative effect at first; it’s easy to paste the thing, and saves time over factoring it away. What’s not to like?
Once you’ve encountered this kind of situation, it becomes pretty obvious that refactoring will save time overall, because you’ve been down this road and know what will happen eventually. Someone who hasn’t is not so easily convinced. It is extremely easy to fool yourself into thinking “This is so obvious, I’m not going to forget to attend to it, ever”. (In fact, attributing perfect ability to itself may be the main function of the mind, according to some theories.)
Therefore the only way of convincing someone else to observe good practices is either to earn their trust, or to accompany them through the same roads that convinced you. The former is a matter of interpersonal skill, and is unfortunately too complex to answer in this format. The latter involves stepping through code where exactly this problem happens; in my experience this is a lot more convincing when done with existing code where cut-and-paste actually did lead to errors (the more serious the better) than with purely hypothetical future scenarios.
Above all, avoid attacking someone’s sense of self-worth directly. Almost every young coder thinks they can do what no one else can do, and don’t need the safety devices that others need. Telling them “yes, you do!” rarely works; it is marginally better to show them proof of how many people have screwed up for precisely this reason and ask them to show regard for whoever will be required to maintain their code in the future. (You may be surprised that human-factor issues play such a big role in productive coding, but I’ve come to realize more and more how decisive they are, and that this is one reason why building complex systems is so deceptively hard.)
3
His primary argument was that “he’s a good programmer and he can understand and read even the first version fast enough” so he doesn’t care if it’s written like that or not.
If that is his primary argument it may be a sign that he is primarily afraid with being not called a good programmer, or even a bad programmer. IMHO the crucial point here is to make clear to him that it is not about a personal affront.
Next, good programmers stay good programmers by learning new tricks. I would try to relate to that, maybe by telling a story from your own experiences showing him, that good programmers can become even better by learning something or getting some specific insight. We all had such moments, I’m pretty sure.
Re the “reading” argument: It surely takes less time to read only half of the code, does it? So if it only were for this one, one easily can increase efficiency by halving the amount of code to read! Is that cool or what?
Last not least, it’s not really about reading, but maintaining code duplicates, which is by definition prone to error. Code duplicates tend to “drift” (or wither, if you want) away from each other over time. It’s not by intention, but people tend to overlook things, change only one of the duplicates, etc.
The question is, am I really wrong? Is this not so important for everyone? It’s a subjective matter or he doesn’t understand that’s not the same thing?
You’re right. The one who is subjective is he, not you. His primary argument primarily only refers to himself, you are (hopefully) do have the project success in mind.
1
I refactored it to […] and he kept arguing with me that it’s the same thing.
Well … as far as the compiler is concerned, it probably is. As far as maintenance effort is concerned, it is not.
His primary argument was that “he’s a good programmer and he can understand and read even the first version fast enough” so he doesn’t care if it’s written like that or not.
This is true (understanding the code is not that hard). It also makes two assumptions: that the most effort required in the future on this code is to understand it (it’s not), and that the written code is correct and will not need major changes (it will).
The most effort in the future, will probably be required to implement changes accross the code base.
Currently, I am working on a medium-sized conversion project (adapt ~100 Visual Studio projects to Win64). When I find an error in code, I usually find it in 20+ to 40+ places (so fixing one error takes hours of searching through the code base, reading code, applying the same fix over and over and recompiling). This makes commits bigger
The code is not only duplicated, it is proliferated (because this one time excuse of “it’s easier to paste the code than refactor it” was judiciously applied for the last 20 years).
This makes it very difficult to:
- cover all code paths for a fix
- estimate the effort to fix something
- determine what to test to verify a code change
It also increases drastically the time/complexity of all operations performed on the code (due to the high degree of duplication, source files are bigger, compilation takes longer, search/replace takes longer, reading the code takes longer and so on).
Another disadvantage is that it is difficult (to the point of being impractical at this point) to implement unit tests, and document or formalize interfaces and responsibilities (because instead of having dependencies between five distinct, testable APIs, we have twenty five APIs that perform roughly the same operations with copied code from one to another and slight variations).
This point (on difficulty of increasing the code quality now) keeps the entire team playing catch-up on deadlines, working reactively and in an impossibility to set long term goals in the project.
1
You’re right.
“So, what happens if somebody other than you needs to add a field to iap
? Will they remember to add it to both branches?”
Essentially, this is a question of the bus factor – if there’s only one person who can effectively maintain the code, then you have a problem because they won’t necessarily be around for ever, whether that be because they do actually get hit by a bus, because they decide to move on for pastures new, because management decides that one employee should no longer be employed by the company, or just because there’s a critical bug that needs to be fixed while the employee is on holiday. Ideally, code should be maintainable by any developer reasonably familiar with the code, not just by the one particular expert on that bit of code.
2
A very important, but subtle, issue when trying to write and maintain correctness is recognizing the difference between things which happen to match, versus things which are “the same”. This frequently arises in object oriented programming (99% of questions relating to deep cloning and deep equality stem from a failure to distinguish when it’s appropriate to have separate things that match, versus having two references to the same thing), but the concept extends to source code as well.
Code duplication is good in cases where it should be possible for one piece of code to change without affecting the other. It’s bad in cases where the two pieces of code might change, but should always do the same thing. Note that it doesn’t really matter whether the code in question is long or short. Unfortunately, there are many cases where language constructs do not permit common pieces of code to be factored out efficiently, since the only way to define a method which has access to a scope’s variables is to create a closure; even if no reference to the closure should never escape the context where it’s created, many languages have no way to specify that.
Thus, it’s necessary to invoke another principle which is applicable to source code just as to run-time objects: the distinctions between identity and matching is relevant only when things are mutable or encapsulate mutable state. The decision of whether to duplicate “immutable” code segments–unlike those which might change–is dependent upon length. Of course, code segments which are long and might change are prime candidates for factorization for two reasons.
1
- Add another field to
PFObject
. - Set value to that field only in the then clause of the original version.
- Run the application in a way that’ll make
createIapInParse
enter the else clause. - Fill a bug report that your new field is left unset.
- Assign that bug report to your coworker.
8