The importance of duplicate code removal [duplicate]

I tried to explain to a coworker the gravity of having duplicate code in a project, on this piece of code:

+ (void)createIapInParse:(SKPaymentTransaction *)transaction {
    Reachability *reach = [Reachability reachabilityWithHostname:@"www.facebook.com"];
    if ([Social getFBUser]) {
        NSString *iapId = [Util getBundleNameFromIdentifier:transaction.payment.productIdentifier];
        PFObject *iap = [PFObject objectWithClassName:@"Iap"];
        iap[@"iapId"] = iapId == nil ? [NSNull null] : iapId;
        iap[@"userId"] = [Social getFBUser].objectID == nil ? [NSNull null] : [Social getFBUser].objectID;
        iap[@"email"] = [Social getFBUser][@"email"] == nil ? [NSNull null] : [Social getFBUser][@"email"];
        iap[@"country"] = [Util getDeviceCountry];
        iap[@"installationId"] = [Util getInstallationId];
        iap[@"score"] = [NSNumber numberWithLong:[GameScore getTotalScore]];
        NSTimeInterval interval = [[NSUserDefaults standardUserDefaults] doubleForKey:@"timeSpentInApp"];
        iap[@"timeSpentInApp"] = [NSNumber numberWithDouble:interval];
        iap[@"retries"] = @0;
        iap[@"transactionIdentifier"] = transaction.transactionIdentifier;
        iap[@"transactionDate"] = transaction.transactionDate;
        iap[@"transactionSource3G"] = [NSNumber numberWithBool:[reach isReachableViaWWAN]];
        [iap saveInBackgroundWithBlock:^(BOOL succeded, NSError *error) {
            NSString *query =[NSString stringWithFormat: @"...", iap.objectId, transaction.payment.productIdentifier];
            [ZeeSQLiteHelper executeQuery:query];
        }];
        NSLog(@"Save in parse: %@", iap);
    } else {
        NSString *iapId = [Util getBundleNameFromIdentifier:transaction.payment.productIdentifier];
        PFObject *iap = [PFObject objectWithClassName:@"Iap"];
        iap[@"iapId"] = iapId == nil ? [NSNull null] : iapId;
        iap[@"userId"] = @"-";
        iap[@"email"] = @"-";
        iap[@"country"] = [Util getDeviceCountry];
        iap[@"installationId"] = [Util getInstallationId];
        iap[@"score"] = [NSNumber numberWithLong:[GameScore getTotalScore]];
        NSTimeInterval interval = [[NSUserDefaults standardUserDefaults] doubleForKey:@"timeSpentInApp"];
        iap[@"timeSpentInApp"] = [NSNumber numberWithDouble:interval];
        iap[@"retries"] = @0;
        iap[@"transactionIdentifier"] = transaction.transactionIdentifier;
        iap[@"transactionDate"] = transaction.transactionDate;
        iap[@"transactionSource3G"] = [NSNumber numberWithBool:[reach isReachableViaWWAN]];
        [iap saveInBackgroundWithBlock:^(BOOL succeded, NSError *error) {
            NSString *query =[NSString stringWithFormat: @"...", iap.objectId, transaction.payment.productIdentifier];
            [ZeeSQLiteHelper executeQuery:query];
        }];
        NSLog(@"Save in parse: %@", iap);
    }
}

I refactored it to:

 + (void)createIapInParse:(SKPaymentTransaction *)transaction {

    Reachability *reach = [Reachability reachabilityWithHostname:@"www.facebook.com"];
    NSString *iapId = [Util getBundleNameFromIdentifier:transaction.payment.productIdentifier];
    PFObject *iap = [PFObject objectWithClassName:@"Iap"];
    NSTimeInterval interval = [[NSUserDefaults standardUserDefaults] doubleForKey:@"timeSpentInApp"];

    iap[@"iapId"] = iapId == nil ? [NSNull null] : iapId;
    iap[@"country"] = [Util getDeviceCountry];
    iap[@"installationId"] = [Util getInstallationId];
    iap[@"score"] = [NSNumber numberWithLong:[GameScore getTotalScore]];
    iap[@"timeSpentInApp"] = [NSNumber numberWithDouble:interval];
    iap[@"retries"] = @0;
    iap[@"transactionIdentifier"] = transaction.transactionIdentifier;
    iap[@"transactionDate"] = transaction.transactionDate;
    iap[@"transactionSource3G"] = [NSNumber numberWithBool:[reach isReachableViaWWAN]];

    if ([Social getFBUser]) {
        iap[@"userId"] = [Social getFBUser].objectID == nil ? [NSNull null] : [Social getFBUser].objectID;
        iap[@"email"] = [Social getFBUser][@"email"] == nil ? [NSNull null] : [Social getFBUser][@"email"];
    } else {
        iap[@"userId"] = @"-";
        iap[@"email"] = @"-";
    }

    [iap saveInBackgroundWithBlock:^(BOOL succeded, NSError *error) {
        NSString *query =[NSString stringWithFormat: @"...", iap.objectId, transaction.payment.productIdentifier];
        [ZeeSQLiteHelper executeQuery:query];
    }];

    NSLog(@"Saved in parse: %@", iap);
}

and he kept arguing with me that it’s the same thing.

His primary argument was that “he’s a good programmer and he can understand and read even the first version fast enough” so he doesn’t care if it’s written like that or not.

The question is, am I really wrong? Is this not so important for everyone? It’s a subjective matter or he doesn’t understand that’s not the same thing?

You’re not wrong. It’s just psychologically very difficult to convince people of their own limitations.

The reason we have invented maxims, guidelines etc. that restrict what we should do is that we have found, over time, that behaving in a particular way leads to more success. Importantly, it will lead to more success even if it doesn’t seem so to us at the moment.

If you think about it for a while, you’ll realize that this is so for most if not all maxims. Correct things that are obviously correct are easy to get right. There are no maxims about not using 500-letter variable names because virtually everybody can see immediately that they would be really hard to use. But duplicated code doesn’t seem to have any negative effect at first; it’s easy to paste the thing, and saves time over factoring it away. What’s not to like?

Once you’ve encountered this kind of situation, it becomes pretty obvious that refactoring will save time overall, because you’ve been down this road and know what will happen eventually. Someone who hasn’t is not so easily convinced. It is extremely easy to fool yourself into thinking “This is so obvious, I’m not going to forget to attend to it, ever”. (In fact, attributing perfect ability to itself may be the main function of the mind, according to some theories.)

Therefore the only way of convincing someone else to observe good practices is either to earn their trust, or to accompany them through the same roads that convinced you. The former is a matter of interpersonal skill, and is unfortunately too complex to answer in this format. The latter involves stepping through code where exactly this problem happens; in my experience this is a lot more convincing when done with existing code where cut-and-paste actually did lead to errors (the more serious the better) than with purely hypothetical future scenarios.

Above all, avoid attacking someone’s sense of self-worth directly. Almost every young coder thinks they can do what no one else can do, and don’t need the safety devices that others need. Telling them “yes, you do!” rarely works; it is marginally better to show them proof of how many people have screwed up for precisely this reason and ask them to show regard for whoever will be required to maintain their code in the future. (You may be surprised that human-factor issues play such a big role in productive coding, but I’ve come to realize more and more how decisive they are, and that this is one reason why building complex systems is so deceptively hard.)

His primary argument was that “he’s a good programmer and he can understand and read even the first version fast enough” so he doesn’t care if it’s written like that or not.

If that is his primary argument it may be a sign that he is primarily afraid with being not called a good programmer, or even a bad programmer. IMHO the crucial point here is to make clear to him that it is not about a personal affront.

Next, good programmers stay good programmers by learning new tricks. I would try to relate to that, maybe by telling a story from your own experiences showing him, that good programmers can become even better by learning something or getting some specific insight. We all had such moments, I’m pretty sure.

Re the “reading” argument: It surely takes less time to read only half of the code, does it? So if it only were for this one, one easily can increase efficiency by halving the amount of code to read! Is that cool or what?

Last not least, it’s not really about reading, but maintaining code duplicates, which is by definition prone to error. Code duplicates tend to “drift” (or wither, if you want) away from each other over time. It’s not by intention, but people tend to overlook things, change only one of the duplicates, etc.

The question is, am I really wrong? Is this not so important for everyone? It’s a subjective matter or he doesn’t understand that’s not the same thing?

You’re right. The one who is subjective is he, not you. His primary argument primarily only refers to himself, you are (hopefully) do have the project success in mind.

I refactored it to […] and he kept arguing with me that it’s the same thing.

Well … as far as the compiler is concerned, it probably is. As far as maintenance effort is concerned, it is not.

His primary argument was that “he’s a good programmer and he can understand and read even the first version fast enough” so he doesn’t care if it’s written like that or not.

This is true (understanding the code is not that hard). It also makes two assumptions: that the most effort required in the future on this code is to understand it (it’s not), and that the written code is correct and will not need major changes (it will).

The most effort in the future, will probably be required to implement changes accross the code base.

Currently, I am working on a medium-sized conversion project (adapt ~100 Visual Studio projects to Win64). When I find an error in code, I usually find it in 20+ to 40+ places (so fixing one error takes hours of searching through the code base, reading code, applying the same fix over and over and recompiling). This makes commits bigger

The code is not only duplicated, it is proliferated (because this one time excuse of “it’s easier to paste the code than refactor it” was judiciously applied for the last 20 years).

This makes it very difficult to:

cover all code paths for a fix
estimate the effort to fix something
determine what to test to verify a code change

It also increases drastically the time/complexity of all operations performed on the code (due to the high degree of duplication, source files are bigger, compilation takes longer, search/replace takes longer, reading the code takes longer and so on).

Another disadvantage is that it is difficult (to the point of being impractical at this point) to implement unit tests, and document or formalize interfaces and responsibilities (because instead of having dependencies between five distinct, testable APIs, we have twenty five APIs that perform roughly the same operations with copied code from one to another and slight variations).

This point (on difficulty of increasing the code quality now) keeps the entire team playing catch-up on deadlines, working reactively and in an impossibility to set long term goals in the project.

You’re right.

“So, what happens if somebody other than you needs to add a field to iap? Will they remember to add it to both branches?”

Essentially, this is a question of the bus factor – if there’s only one person who can effectively maintain the code, then you have a problem because they won’t necessarily be around for ever, whether that be because they do actually get hit by a bus, because they decide to move on for pastures new, because management decides that one employee should no longer be employed by the company, or just because there’s a critical bug that needs to be fixed while the employee is on holiday. Ideally, code should be maintainable by any developer reasonably familiar with the code, not just by the one particular expert on that bit of code.

A very important, but subtle, issue when trying to write and maintain correctness is recognizing the difference between things which happen to match, versus things which are “the same”. This frequently arises in object oriented programming (99% of questions relating to deep cloning and deep equality stem from a failure to distinguish when it’s appropriate to have separate things that match, versus having two references to the same thing), but the concept extends to source code as well.

Code duplication is good in cases where it should be possible for one piece of code to change without affecting the other. It’s bad in cases where the two pieces of code might change, but should always do the same thing. Note that it doesn’t really matter whether the code in question is long or short. Unfortunately, there are many cases where language constructs do not permit common pieces of code to be factored out efficiently, since the only way to define a method which has access to a scope’s variables is to create a closure; even if no reference to the closure should never escape the context where it’s created, many languages have no way to specify that.

Thus, it’s necessary to invoke another principle which is applicable to source code just as to run-time objects: the distinctions between identity and matching is relevant only when things are mutable or encapsulate mutable state. The decision of whether to duplicate “immutable” code segments–unlike those which might change–is dependent upon length. Of course, code segments which are long and might change are prime candidates for factorization for two reasons.

Add another field to PFObject.
Set value to that field only in the then clause of the original version.
Run the application in a way that’ll make createIapInParse enter the else clause.
Fill a bug report that your new field is left unset.
Assign that bug report to your coworker.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 23:10

Thẻ: code-quality, dry, objective-c

Thiết kế website giá rẻ

Danh mục

The importance of duplicate code removal [duplicate]