Any tools/suggestions on how to refute code coverage quality argument

Now I know people could consider this question duplicate or asked many times, in which case I would appreciate a link to relevant questions with answer to my question.

I have been recently in disagreement with some folks about code coverage. I have a group of people who want our team to drop looking at code coverage altogether based on the argument that 100% coverage does not mean good quality tests and thus good quality code.

I have been able to push back by selling the argument that Code Coverage tells me what has not been tested for sure and help us focus on those areas.

(The above has been discussed in a similar fashion in other SO questions like this one – https://stackoverflow.com/questions/695811/pitfalls-of-code-coverage)

The argument from these folks is – then team would react by quickly creating low quality tests and thus waste time while adding no significant quality.

While I understand their point of view, I am searching for a way to make a more robust case for code coverage by introducing more robust tools/frameworks that take care of more coverage criteria (Functional, Statement,Decision, Branch, Condition, State, LCSAJ, path, jump path, entry/exit, Loop, Parameter Value etc).

What I am looking for is suggestion for a combination of such code coverage tools and practices/processes to go with them which can help me counter such arguments while feeling comfortable about my recommendation.

I would also welcome any accompanying comments/suggestions based on your experience/knowledge on how to counter such an argument, because while subjective, code coverage has helped my team be more conscious of code quality and value of testing.

Edit: To reduce any confusion about my understanding of weakness of typical code coverage, I want to point out that I am not referring to Statement Coverage(or lines of code executed) tools(there are plenty). In fact here is a good article on everything that is wrong with it: http://www.bullseye.com/statementCoverage.html

I was looking for more than just statement or line coverage, going more into multiple coverage criteria and levels.

See: http://en.wikipedia.org/wiki/Code_coverage#Coverage_criteria

The idea is that if a tool can tell us our coverage based on multiple criteria then that becomes a reasonable automated assessment of test quality. I by no means am trying to say that line coverage is a good assessment. In fact that’s the premise of my question.

Edit:
Ok, maybe I projected it a bit too dramatically, but you get the point. The problem is about setting processes/policies in general across all teams in a homogeneous/consistent fashion. And the fear is general that how do you ensure quality of tests, how do you allocate guaranteed time without having any measure to it. Thus I like having a measurable feature that when backed up with appropriate processes and the right tools would allow us to improve code quality while knowing that time is not being force spent in wasteful processes.

EDIT: So far what I have from the answers:

Code reviews should cover tests to ensure quality of tests
Test First strategy helps avoid tests that are written after the fact to simply increase coverage %
Exploring alternative tools that cover test criteria other than simply Statement/Line
Analysis of covered code/number of bugs found would help appreciate the importance of coverage and make a better case
Most importantly trust the Team’s input to do the right thing and fight for their beliefs
Blocks Covered/# of tests – Debatable but holds some value

Thanks for the awesome answers so far. I really appreciate them. This thread is better than hours of brainstorm with the powers that be.

In my experience, code coverage is as useful as you make it. If you write good tests that cover all of your cases, then passing those tests means that you have met your requirements. In fact that’s the exact idea that Test Driven Development uses. You write the tests before the code without knowing anything about the implementation (Sometimes this means another team entirely writes the tests). These tests are set up to verify that the final product does everything that your specifications says it done, and THEN you write the bare minimum code to pass those tests.

The problem here, obviously, is that if your tests aren’t strong enough, you will miss edge cases or unforeseen problems and write code which doesn’t truly meet your specifications. If you are truly set on using tests to verify your code, then writing good tests is an absolute necessity, or you’re really wasting your time.

I wanted to edit the answer here as I realized that it didn’t truly answer your question. I would look at that wiki article to see some stated benefits of TDD. It really comes down to how your organization works best, but TDD is definitely something in use in the industry.

First off, people do advocate 100% coverage:

Most developers view … “100% statement coverage” as adequate. This is a good start, but hardly sufficient. A better coverage standard id to meet what’s called “100% branch coverage,” …

Steve McConnell, Code Complete, Chapter 22: Developer Testing.

As you and others have mentioned, code coverage for the sake of coverage alone isn’t likely to accomplish much. But if you can’t make a line of code execute, why is it written?

I’d suggest resolving the argument by gathering and analyzing data on your own projects.

To gather the data, I personally use the following tools:

JaCoCo and the associated Eclipse plugin EclEmma for measuring code coverage.
Ant scripts for automated building, testing, and reporting.
Jenkins for continuous builds – any change in source control triggers an automatic build
JaCoCo Plugin for Jenkins – captures coverage metrics for each build, and graphs trends. Also allows definitions of per-project coverage thresholds that affect the health of the build.
Bugzilla for tracking bugs.

Once you have that (or something similar) in place, you can start to look at your own data more closely:

are more bugs found in poorly covered projects?
are more bugs found in poorly covered classes/methods?
etc.

I’d expect that your data will support your position on code coverage; that has certainly been my experience. If it doesn’t, however, then maybe your organization can succeed with lower code coverage standards than you’d like. Or maybe your tests aren’t very good. The task will hopefully focus effort on producing software with fewer defects, regardless of the resolution of the code coverage disagreement.

The argument from these folks is – the team would react by quickly
creating low quality tests and thus waste time while adding no
significant quality.

This is an issue of trust, not tools.

Ask them why, if they really believe that statement, they would trust the team to write any code at all?

Ok, maybe I projected it a bit too dramatically, but you get the
point. The problem is about setting processes/policies in general
across all teams in a homogeneous/consistent fashion.

I think that’s the problem. Developers don’t care (and often for excellent reasons) about consistent or global policies, and want the freedom to do what they think is right rather than comply to corporate policies.

Which is reasonable unless you prove that global processes and measures have value and a positive effect on quality and speed of development.

Usual timeline:

dev: hey, look – I added code coverage metrics to our dashboard, ain’t that great?
manager: sure, let’s add mandatory goals and compliance on those
dev: never mind, code coverage is stupid and useless, let’s drop it

In my experience, there’s a few things to combine with code coverage to make the metric worthwhile:

Code Reviews

If you can punt bad tests back to the developer, it can help limit the number of bad tests that are providing this meaningless coverage.

Bug Tracking

If you have a bunch of code coverage on a module, but still get many/severe bugs in that area, then it might indicate a problem where that developer needs improvement with their tests.

Pragmatism

Nobody is going to get to 100% with good tests on non-trivial code. If you as the team lead look at the code coverage, but instead of saying “we need to get to N%!” you identify gaps and ask people to “improve coverage in module X” that achieves your goal without providing people an opportunity to game the system.

Blocks Covered/# of Tests

Most code coverage tools list blocks covered vs blocks not covered. Combining this with number of actual tests lets you get a metric indicating how ‘broad’ tests are, either indicating bad tests or coupled design. This is more useful as a delta from one sprint to another, but the idea is the same – combine code coverage with other metrics to gain more insight.

Here are my 2 cents.

There are many practices that have received a lot of attention recently because they can bring benefits to software development. However, some developers apply those practices blindly: they are convinced that applying a methodology is like executing an algorithm and that after performing the correct steps one should get the wanted result.

Some examples:

Write unit tests with 100% code coverage and you will get better code quality.
Apply TDD systematically and you will get better design.
Do pair programming and you will improve code quality and reduce development time.

I think the basic problem with the above statements is that humans are not computers and writing software is not like executing an algorithm.

So, the above statements contain some truth but simplify things a bit too much, e.g.:

Unit tests catch lots of errors and code coverage indicates which parts of the code are tested, but testing trivial things is useless. For example, if by clicking on a button the corresponding dialog opens up, the whole logic sending the button event to the component that opens the dialog can be tested by a simple manual test (click on the button): does it pay off to unit test this logic?
While TDD is a good design tool, it does not work well if the developer has a poor understanding of the problem domain (see e.g. this famous post).
Pair programming is effective if two developers can work together, otherwise it is a disaster. Also, experienced developers may prefer to briefly discuss the most important issues and then code separately: spending many hours discussing lots of details that they both already know can be both boring and a big waste of time.

Going back to code coverage.

I have been able to push back by selling the argument that Code
Coverage tells me what has not been tested for sure and help us focus
on those areas.

I think you have to judge from case to case if it is worthwhile to have 100% coverage for a certain module.

Does the module perform some very important and complicated computation? Then I would like to test every single line of code but also write meaningful unit tests (unit tests that make sense in that domain).

Does the module perform some important but simple task like opening a help window when clicking on a button? A manual test will probably be more effective.

The argument from these folks is – then team would react by quickly
creating low quality tests and thus waste time while adding no
significant quality.

In my opinion they are right: you cannot enforce code quality by only requiring 100% code coverage. Adding more tools to compute the coverage and make statistics will also not help. Rather, you should discuss which parts of the code are more sensitive and should be tested extensively and which ones are less error-prone (in the sense that an error can be discovered and fixed much more easily without using unit tests).

If you push 100% code coverage onto the developers, some will start to write silly unit tests to fulfill their obligations instead of trying to write sensible tests.

how do you allocate guaranteed time without having any measure to it

Maybe it is an illusion that you can measure human intelligence and judgment.
If you have competent colleagues and you trust their judgment, you can
accept when they tell you “for this module, increasing the code coverage will bring very little benefit. so let’s not spend any time on it” or, “for this module we need as much coverage as we can get, we need one extra week to implement sensible unit tests.”.

So (again, these are my 2 cents): do not try to find a process and set parameters like code coverage that must fit all teams, for all projects and for all modules. Finding such a general process is an illusion and I believe that when you have found one it will suboptimal.

“team would react by quickly creating low quality tests and thus waste time while adding no significant quality”

This is a real risk, not just theoretical.

Code overage alone is a dysfunctional metric. I learned that lesson the hard way. Once, I emphasized it without the availability of balancing metrics or practices. Hundreds of tests that catch and mask exceptions, and without assertions is an ugly thing.

“suggestion for a combination of such code coverage tools and practices/processes to go with them”

In addition to all the other suggestions, there is one automation technique that can assess the quality of tests: mutation testing (http://en.wikipedia.org/wiki/Mutation_testing). For Java code, PIT (http://pitest.org/) works, and it’s the first mutation testing tool I’ve come across that does.

As you note, lack of code coverage is readily identifiable as a software quality risk. I teach that code coverage is a necessary, but insufficient, condition for software quality. We have to take a balanced scorecard approach to managing software quality.

Code coverage is certainly not proof of good unit tests, in that they are correct.

But unless they can provide a way of proving that all unit tests are good (for whatever definition of good they can come up with) then this is really a mute point.

I have always found that code coverage is easily susceptible to the Hawthorne Effect. This caused me to ask “why do we have any software metrics at all?” and the answer usually is to provide some high level understanding of the current state of the project, things like:

“how close are we to done?”

“how is the quality of this system?”

“how complicated are these modules?”

Alas, there will never be a single metric that can tell you how good or bad the project is, and any attempt to derive that meaning from a single number will necessarily over simplify. While metrics are all about data, interpreting what they mean is a much more emotional/psychological task and as such probably cant be applied generically across teams of different composition or problems of different domains.

In the case of coverage I think it is often used as a proxy for code quality, albiet a crude one. And the real problem is that it boils down an awfully complicated topic to a single integer between 0 and 100 which will of course be used to drive potentially unhelpful work in an endless quest to achieve 100% coverage. Folks like Bob Martin will say that 100% coverage is the only serious goal, and I can understand why that is so, because anything else just seems arbitrary.

Of course there are lots of ways to get coverage that dont actually help me get any understanding of the codebase – e.g. is it valuable to test toString()? what about getters and setters for immutable objects? A team only has so much effort to apply in a fixed time and that time always seems to be less than the time required to do a perfect job, so in absence of perfect schedule we have to make do with approximations.

A metric I have found useful in making good approximations is Crap4J. It is now defunct but you can easily port/implement it yourself. Crap4J to attempts to relate code coverage to cyclomatic complexity by implying that code which is more complicated (ifs, whiles, fors etc.) should have higher test coverage. To me this simple idea really rang true. I want to understand where there is risk in my codebase, and one really important risk is complexity. So using this tool I can quickly assess how risky my code base is. If it is complicated the coverage had better go way up. If it isn’t I dont need to waste time trying to get every line of code covered.

Of course this is but one metric and YMMV. You have to spend time with it to understand if it will make sense to you and if it will give your team a reasonably grok-able feeling of where the project is at.

I wouldn’t say that going back and covering existing code is the best route forward. I would argue that it makes sense to write covering tests for any new code you write and or any code you change.

When bugs are found, write a test that fails because of that bug and fix the bug so that the test turns green. Put in the comments of the test what bug it’s written for.

The goal is to have enough confidence in your tests that you can make changes without concern for unexpected side effects. Check out Working Effectively with Legacy Code for a good summary of approaches to taming untested code.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 02:17

Thẻ: java, test-coverage, tools, unit-testing

Thiết kế website giá rẻ

Danh mục

Any tools/suggestions on how to refute code coverage quality argument