The Value of Failing Tests

Oct 30, 2018 | Best Practices, Test Automation Insights

Value of failing tests
Proponents of test-driven development (TDD) can be overheard muttering their mantra to themselves and others: “Red, green, refactor.” The red-green-refactor pattern is a simplification of the TDD cycle, which requires that you demonstrate test failure prior to writing the code to make it pass.

The colorful part of the mantra derives from the tools used, which universally use a stoplight color scheme to indicate whether a test is passing or failing. Red indicates that you ran a test through a unit testing tool and at least one expectation described by the test was not met. Green indicates that the test passed: All expectations described by the test were met and no errors occurred during test execution.

In TDD, a test describes a new behavior through code that interacts with the underlying system. The test asserts that expected conditions hold true upon completion of that interaction.

Because the test is describing yet-to-be-implemented behavior, it might not even compile when first written. It might take the developer a few minutes and attempts to build enough code before the test can execute in a unit testing framework. These incremental steps are failures, but they’re not the red we’re looking for: A test must fully describe the new behavior before we consider the red part of the TDD cycle complete.

But what’s the point? If we know that a test describes nonexistent behavior, why take the time to run it through the testing tool? It might take only a couple of seconds to run the test and another couple to note the result, but we know that seconds are precious things that add up to real time.

We should probably be clear on why it’s important to first see red.

Heeding feedback

TDD is a simple feedback mechanism. The color at each step tells you whether you are ready to move on to the next step. A red (failing) test is feedback that says your test represents a proper description of a new behavior. The description contains at least one assertion that demonstrates whether the proper outcome has been achieved. A green (passing) test is feedback that says your system does all it’s been asked to do so far. You can choose to clean up its implementation through refactoring, to write another failing test to add more behavior, or to ship what you have.

One glaringly obvious conclusion you can make based on a red test is that things aren’t working yet! And they shouldn’t be — you’ve not written the code to make the test pass. (It’s possible, though, that the test is red for other reasons; we’ll touch on those later.)

Particularly as the behaviors in your system get more complex and the code gets more interesting, you might run your new test expecting it to fail, only to see it pass. Why? Sometimes the behavior you’re describing already exists in the system, but that fact isn’t known to you.

Seeing a test go green when you were expecting red can happen for a number of other reasons; we’ll talk about those as well. For now, the key thing to remember is that your reaction should always be the same:

A test that passes with no change to your classes is always cause for humility and pause.

“Oh! My test passed! That’s special,” you say (sounding like Dr. Seuss). It’s time to don your thinking cap. Why is it passing? Is the behavior already in the code? Or is something else amiss?

Catching dumb mistakes

I teach TDD from time to time in a classroom setting. I implore that pairs follow the TDD rules, but there are often students who don’t listen, or don’t figure they need that extra step. As students work on their exercises, I wander around the classroom to ensure they’re staying on track. In one class, I walked up behind a pair of programmers who proudly announced, “We’re writing our last test.” That was odd, as it was only about 20 minutes into an exercise that takes most students at least 45 minutes to complete.

A quick look at the code triggered my spidey sense. “Your tests are all passing? Let’s see this current test fail.”

“OK, but our tests have been passing all along.” Oh. Oops. I reminded them that it was important to stick to the cycle, and that absence of any failures meant that we really didn’t know what state things were in. Sure enough, their test run showed all tests passing, even though the new test should have generated a test failure. A bit of digging revealed a rookie coding mistake, and as a result, none of their new tests were included in the test suite that the tool was running.

The students fixed their problem, but pride turned to dismay when the next test run included their tests. Every single one was failing. “Looks like you have a bit of catching up to do,” I said in an upbeat manner as I moved to the next pair.

Here’s a list of some dumb reasons your tests might show green when you should be expecting at least one to be red:

  • You forgot to save
  • You forgot to compile
  • The new test isn’t part of the current set of tests you’re running (hint: Try to run all the tests, all the time)
  • You’re using a framework that requires you to explicitly add the test to a suite, but you forgot
  • You didn’t properly mark the test as a test — for example, JUnit requires tests to be marked with an @Test annotation
  • You’re running tests against a different version of the code (possibly due to path or classpath issues)
  • You didn’t include an actual assertion in the test
  • The test is disabled
  • The test isn’t getting picked up for some obscure reason (Visual Studio’s Test Explorer continues to confound me this way occasionally)

Don’t feel bad; I’m pretty sure I’ve made every one of these mistakes at least once.

We make dumb mistakes all the time, but sometimes we make mistakes because things aren’t so clear. It’s possible, particularly if you are using test doubles, that the test is just not properly constructed. Maybe the assertion isn’t really expressing what you think it is, or maybe you are testing something that’s not really what you should be testing.

Because of all these opportunities (and many more) to make mistakes when constructing tests, you must ensure that when you finally get a test to pass, it’s for the right reason. Seeing the test fail first dramatically increases the odds that you’ve written a legitimate test and that the subsequent code you write properly implements the behavior described by the test.

One other hint: Keep track of the current test count, and make sure it goes up by one when you add a new test.

Violating the rules of TDD

Per Uncle Bob, you cannot “write any more production code than is sufficient” to pass your failing unit test. TDD newbies violate this rule wantonly. If a single statement would suffice to pass a test, they’ll introduce an if statement to guard against inappropriate input. If it’s possible that an exception might be thrown, they’ll sneak in a try/catch block to handle the exception. While these are good programming practices, the proper TDD approach is to first document these non-happy-path cases by writing a failing test.

Seasoned programmers often introduce constructs to support future generalization. While a simpler solution might exist to meet the needs of the current set of tests, these developers have a good sense of what’s coming, whether it’s in the near or distant future. Rather than go through a series of small test-driven increments, they choose to take a larger step with a prematurely complex solution.

Some problems exist with this sort of premature speculation:

  • Premature complexity can increase development costs until the need for complexity arises
  • We’re all wrong at times about what’s coming. If the need for the complexity never arises, we’re forever burdened with an unnecessarily challenging solution. Time to understand and maintain code increases. Also, sometimes complexity arises and it’s different than expected; it’s generally far more expensive to adapt an ill-fitting, overly complex design than a minimally simple design
  • It kills TDD. Prematurely building support for not yet described cases means that it is no longer possible to stick to the red-green-refactor rhythm. Once I introduce the more sophisticated solution, any other tests that might be useful to write around these additional cases will pass immediately. It becomes very difficult to write a test that first fails, which in turn means that all such tests should be treated with suspicion

Even when not doing TDD — such as when writing a test after the fact against legacy code — it’s still important to ensure that you’ve seen the test fail at least once. With test-after, this usually means deliberately breaking or commenting out production code. Trust no test you’ve never seen fail.

TDD is a self-reinforcing rhythm. The more you follow its rule of writing no more code than needed, the easier it is to write subsequent tests that will first fail.

There’s not much to TDD: red, green, refactor. Hidden within this simple mantra, however, is a surprising amount of nuance around keeping things simple and safe. Always see red!

Related Posts:

Secure Your Code, Harden Your App, Automate Its Testing

Secure Your Code, Harden Your App, Automate Its Testing

With DevOps practices more popular than ever in software engineering, there has been a push to integrate security, optimization, and frequent testing into the development process. However, as the baseline for what's considered good software evolves, so does the need...

A Guide to Test Driven Development (TDD)

A Guide to Test Driven Development (TDD)

For developers who work on many agile projects, test-driven development (TDD) may be something you wish to incorporate into your software development life cycle (SDLC). It’s a way to implement software programming while integrating unit testing, programming, and...