Testing costs plenty: in engineering effort, license fees, and hours on the clock. Are the payoffs worth those expenses? How can we ever be sure?
Organizations, for example, invest heavily in test automation to find defects early and release faster. The process usually starts with hiring skilled testers, forming an automation team, and building an automation framework with the plethora of tools and frameworks available these days.
Once considerable time is spent building and configuring the framework, teams start integrating the tests into their CI/CD (continuous integration and delivery) pipeline and make them run periodically based on checked-in code. Most teams come to the conclusion that this is the endpoint of the entire automation creation and execution cycle.
But one more critical aspect of test automation remains: teams need to spend time and research on measuring the success of the automated tests. Too often, teams know what they put into a project, but finishing so wears them out that they don’t consider what came out from the project.
Let’s examine alternative metrics teams might use to bring value squarely back in focus.
Good metrics for automation success
Here are good metrics to consider, depending on the context of the project and the team. These metrics reduce the ambiguity in measuring automation success and thereby provide more information that helps make decisions.
Be ready to label certain metrics as “KPIs” when speaking with colleagues on the managerial or business sides of the business, because “key performance indicator” is the preferred term of art for strategizing in those circles.
Counts of completed tests
Difficulties with simple-minded counts abound, and a few of these appear in the selection below. Basic aggregates through time of total tests completed, along with their outcomes — pass, fail, inconclusive, and so on — are so fundamental that they’re almost always worth keeping. These are important to sum over both automated and non-automated tests, and over both user interface (UI) and application programming interface (API) requirements. If an automation project leads to a decline in total tests completed per day or week, for example, something is almost certainly wrong.
Notice that some metrics — time saved, or percentage of defects found, for instance — are simple to “score”: more is better. These are easy for organizations to adopt as KPIs. In contrast, charts through time of completed test count categories aren’t “grades” of this sort. The point with these counts isn’t just to rack up a big score, but to hint at a deeper story. Counts of completed tests are valuable to the extent they lead to follow-up questions: why did manual tests suddenly find no defects in weeks three and four? If trends continue, how many tests will need to be run in week twenty to find a single defect? Do we have a good system for managing reports of defects which share a root cause?
Testing time saved
One of the main reasons for building automated tests is to save valuable manual testing effort. While the automated tests repeat mundane testing tasks, testers can focus on the more critical and higher priority tasks, spend time exploring the application, and test modules that are hard to automate and need extensive critical thinking.
The amount of testing time saved is a good metric to assess value provided to the team. For example, in a two-week sprint, if the automated tests reduce the manual testing effort from two days to four hours, that is a big win for the team and the organization — and it converts into money saved — so such measurements should be tracked and communicated.
The flakiness of automated tests
If the team spends four months building a robust automation framework, then spends more time maintaining the automated tests than actually using them to find defects, the entire effort is wasteful. Surprisingly, this is a common problem in teams; their tests are unstable and keep failing due to multiple factors. As a result, teams stop trusting the automated tests and eventually decide to go back to testing features manually.
It is important to start with a small number of tests, run them constantly, identify flaky tests and separate them out from the stable tests. This methodical approach helps to raise the value of automated tests.
Number of risks mitigated
Testing needs to be prioritized based on risks. These could be unexpected events that would impact business, defect-prone areas of the application, or any past or future events that could affect the project.
A good approach to measure automation success in relation to risk is to rank the risks based on high to low priority. Then, automate test cases based on risk priority, and track the number of risks that have been mitigated by the automated tests.
EMTE
Equivalent Manual Test Effort (EMTE) is a “classic” concept in software quality assurance (QA) that remains widely discussed. EMTE largely serves to answer such questions as, “Are we better off after this automation project than before we started?”
Coverage
While coverage sounds as though it admits simple interpretation in terms of “more is better”, much of the value of this measurement lies in careful observation through time. Distinguish coverage in terms of source-lines from requirement-count; make sure that whatever levels of coverage automated and manual tests handle on their own, together they comprehensively guarantee 100% coverage.
Scores of requirements coverage relate directly to user stories. Highlight coverage results to business teammates in terms of those user stories.
Defect density, defect containment efficiency, and related measures of effectiveness
Several somewhat-technical metrics get at the idea: how likely are our tests to find any particular defect? Slightly different formulas are all in use for measuring that one idea. When in doubt, choose a particular definition that the team or at least the organization has used before, as a way to minimize surprises.
Ease of use of the automated framework or tests
Teams often forget that their automated tests need to be low-maintenance, should be easy to run by anyone, must have simple and understandable code, and should give clear information about the tests that run in terms of passed and failed tests, logs, visual dashboards, screenshots and more. This combination of factors that mark ease of use powerfully suggests the automation effort was successful. While ease of use is a subjective metric at this level, thinking about “approachability” captures a lot of information about tests’ impact on teams.
All these metrics need to be adapted based on the context of the project; this is not a “one size fits all” solution. But they do help change the mindset of teams to shift focus to the value of automated tests, instead of arbitrary, simple-minded numbers and effort expended.
Misleading metrics for automation success
Measuring automation is a highly debated topic: metrics play a variety of roles for different organizations and individuals, and their choices vary across a wide range. Certain well-known metrics are deeply-flawed, though. Along with careful consideration of the metrics above, it’s also healthy to think through a few of the frailties of metrics. Think of it as practice in sensitivity to what can go wrong with a metrics program. These are a couple of the most common metrics teams track that are least useful in assessing automation.
Number of automated test cases
This is one of the most common ways teams mis-measure automation success. They believe if there are a certain number of automated tests, the entire automation effort is successful.
Notice this is a prime example of the mistake described above of confusing the investment in a project with its return. A count of automated test cases tells much more about effort expended than value received.
Crude counts invariably blur quality distinctions, moreover. Consider for a moment an example where a test suite begins with 100 test cases. A project automates 90 of them. While that certainly represents 90% of something, there’s no certainty that automation will save 90% of human effort, or that 90% of errors will turn up in automated tests. The automated tests might be those that were easiest, and least informative, to automate; they might equally be more likely than the human-executed tests to detect errors. In isolation, simple counts deserve sophisticated interpretation.
Number of defects found
To use the number of defects found by automated scripts as a measure of success also deserves careful interpretation. Defect count is misleading and gives a false sense of accomplishment to teams.
For example, if the automation script found 10 defects, it could mean the developers did not do their job correctly; and if there are zero defects found, it could mean the automation scripts weren’t effective. There are multiple ways to interpret these results.
Counts of defects found and automated test cases are open to different interpretations. Without careful analysis, they can easily steer the team’s mindset from the real value provided by automated tests to misleading goals and benchmarks.
Loops of feedback loops
You choose specific metrics, and exclude others, to support larger business goals, of course. Good metric selections help determine whether your automation projects have succeeded, and guide future automation projects. Through time, you’ll likely learn more about how well particular metrics actually serve in these roles, and you can further update and refine the choices. More important than any specific system or process is practice in the habit of improving and refining your systems and processes.
Choose a few specific metrics, launch a modest but concrete automation project, see how it works out, and keep improving, day after day. After just a few cycles, you’ll find your software quality is so much better that it would hurt to return to your old practices.
All-in-one Test Automation
Cross-Technology | Cross-Device | Cross-Platform