One common question on software development teams is “When should we stop testing?” It would be simple to say, “we’ll stop testing when we’ve found all of the defects.” But software testing can’t tell you that your application is defect-free: it can only tell you what has been tested, and what defects have been found. For that reason, teams have developed numerous approaches to deciding when to stop testing. Books and numerous papers have been written and many presentations, lectures, and workshops around answering that question.
Recently, Ranorex partnered with Pulse Research to ask IT leaders how they decided when they had done enough testing. The chart below shows their responses, which we’ll discuss below:
At the top of the chart above are metrics — quantifiable measures used to assess the status of your testing. Common metrics for deciding when you have done enough testing include:
- % of critical tests passed: (total # of critical passed tests / total critical tests executed) * 100
- % of total tests passed: (total # of passed tests / total tests executed) * 100
- % of requirements covered by testing (code coverage): (total # of requirements tested / total # of requirements) * 100
- % of tests executed: (total # of tests executed / total tests) * 100
- % of critical business flows passed: (total # of critical business flows passed / total # of critical business flows tested) * 100
It can also be useful to compare these metrics to previous release cycles to measure the improvement in your application development and QA practices. Is the number of tests executed per cycle increasing or decreasing? Are more critical tests passing each time, or is the number decreasing?
Having a Go/No-Go meeting
In the Pulse survey, having a Go/No-Go meeting was next on the list of common approaches to deciding when to stop testing. The image often associated with a “go no-go meeting” is that of a mission control team preparing a spaceship for launch. Each member of the team reviews the available metrics and then makes a recommendation about whether your latest software release is ready to launch.
Despite its name, though there are actually three possible outcomes from a Go/No-Go meeting:
- Go – the release is ready for deployment
- Go with caveats – the release may be deployed as long as identified issues can be resolved with in a a set time.
- No Go – the release is not ready for deployment. The team will identify action items that must be completed prior to scheduling another Go/No-Go meeting
By themselves, metrics can’t tell you that you’ve done enough testing. What does it mean to your team that 98% of critical tests have passed? Is that sufficient? Or do you need 100% of tests to pass? This is where a Go/No-Go meeting can be helpful.
Applying the diminishing rate of return
The final set of factors in the chart could be grouped together under the concept of the “diminishing rate of return.” At some point, the cost of continued software testing outweighs the benefit of continuing. This might happen because of one of the following occurs:
- A minimum acceptable defect rate is achieved. You can calculate a defect rate as (total number of defects / tested modules or functions). The rate that is acceptable will vary greatly depending on the nature of your application. Systems upon which lives depend, such as those for aviation or space flight, have a far lower tolerance for error than those for video games.
- The project deadline is reached: enough time for testing is a common issue on software development. In a recent survey conducted with Pulse research, only 13% of IT leaders strongly agreed that their current software development process allows sufficient time for QA testing. Another 56% “somewhat agreed” that they had enough time.
- The project budget is reached: like running out of time, available budget can also be a factor that drives when it’s time to stop testing.
Asking the right questions
Peter G. Walen, conference speaker and member of the Agile Alliance, the Scrum Alliance, and the American Society for Quality (ASQ), shared a different perspective on the question, “when to stop testing.” Here is what Pete had to say:
Recently, I was asked a question that was different. The team had a variety of tests, each exercising interesting aspects of the application. They had progressed from simple checks against requirements to testing around requirements. They were doing deep dives into areas that caused them problems in the past.
They were looking at areas of concern not covered in “requirements” or “acceptance criteria” but their experience with the product told them these were likely trouble spots. They also used their knowledge of how their customers used the software and expected it to behave.
They had a mix of manual and automated tests defined. They had multiple levels of complexity covered by the tests, depending on what they intended to find out about the system. Some were simple “smoke tests” they could use in their CI environment. Some were more complex integration tests looking at interactions between segments of the system.
Their question was, “We have all these tests we regularly run. We are developing more tests as the product changes. How much testing is really enough?”
Step 1: Determine what the team means by “enough”
Asking a question about “enough testing” might be confusing to some organizations. They might understand “testing” that consists of positive confirmation of what is “expected” and not looking beyond the “happy path.” The idea of “one requirement: one test” is normal. The challenge is, while this may be acceptable for some organizations it is something less than that for many others.
Then there are other organizations, like the one I described in the introduction:
- They cover as many scenarios as possible.
- Tests that make sense are included in their CI test suite.
- Other tests are included in their automated integration and regression test suites.
- They are using tools to run tests that have provided interesting or unexpected results in the past, so their skilled analysts can focus on new work and not waste their time repeating steps that could be run by an automation tool.
Nearly every other organization is somewhere between these two extremes.
Step 2: Avoid the “not as much testing as we think” trap
Many organizations are closer to the minimal “one requirement: one test” than they realize. A simple script is created and intended to be an open-ended question. The steps are expected to be run several times with different values which can be correct or incorrect in various ways.
Testing around a requirement is what is expected with such a test. Except, when deadlines are looming or past, and massive pressure is being applied to “finish testing,” such tests might be run two or three times, instead of the seven or eight they might otherwise be executed. They might not exercise all the possible logic paths, even if they are aware of them.
Corners get cut for the sake of time. People who are looking only for the checkbox of “this test verifies this requirement” are likely not going to consider what “verifies” actually means or implies. They have fallen into the “not as much testing as we think” trap.
While the intent is there and we can recognize the goal, they are falling short of that goal.
Step 3: Avoid the “kitchen sink” trap
Some teams or organizations look for testing to cover every possible behavior and combination of values. They look for testers to evaluate everything possible in the system and fully document, or at least fully exercise those possibilities.
Then testers are expected to repeat the tests. All of them, for every release and every build.
The volume and amount of work needed to be done are overwhelming. Even if they try to automate tests, continuing to run every single functional test in the name of regression testing becomes impossible. Tests get left out. Tests that are quick and easy to run often are selected in place of more complex tests that take much longer.
Why? Because when the management team realizes the Herculean nature of the task, they often settle for some percentage of the tests being executed. If the test team can get 80% of the tests run by doing small simple tests, then they can focus time on the new features that need more careful thought.
Step 4: Find the right balance
What does the idea of “enough” mean? There is a balance between the two extremes. What and where that balance is depends on the situation.
I find a level of testing which allows for in-depth exploration of key features, along with reasonable coverage of secondary features to work much of the time. What will be tested more and less than other features should be discussed with the stakeholders and project team so everyone is in agreement. Then, regression and integration tests need to be updated accordingly to handle the changes.
Where these fall will vary by organization, team, and project. In short, the first team I described did a very good job finding their balance. It rarely happens on the first try and can take some effort and patience. It is worth it.
All-in-one Test Automation
Cross-Technology | Cross-Device | Cross-Platform