TDD
Test-driven development (TDD), continuous testing and related practices have many advantages, including more flexibility, easier code maintenance, and a faster development cycle. But they also have a weakness: failure exhibition, or what I call the “Red moral hazard.” This vulnerability is common, but many software professionals are still unaware that they’re causing it. Here’s what you can do about it.

Red, Green, Refactor

First, a bit of context is necessary. “Red, Green, Refactor” is the widely recognized motto of TDD:

  •  Red: Create a test and make it fail
  •  Green: Minimally modify the implementation to pass the test
  •  Refactor: Eliminate duplication and otherwise enhance the style of the implementation

Tutorials and references appropriately focus on the challenges and subtleties of the Green and Refactor steps; these are more time-consuming. The Red step should be straightforward and quick.

Too often, though, programmers entirely skip Red. To do so undercuts the value of Green and Refactor, and therefore all of TDD. Worse, all the possible remedies for the omission have their own problems.

Consider a concrete, simplified example: Imagine a new requirement that passwords be at least nine characters long. Python is an effective pseudocode for this example. First, a programmer might write

def is_long_enough(trial_password, minimum=9):
'''
'trial_password' is a string.

The return value should be True for trial_password of
nine characters or more; otherwise it's False.
'''

Next, the programmer writes a test:

import unittest

class Tests(unittest.TestCase):
def test_length(self):
self.assertTrue(is_long_enough("long-password"))

At this point, the test probably will show something like AssertionError: None is not true. That AssertionError is a good thing — exactly what TDD tells practitioners they need.

A normal TDD sequence might then continue by updating the is_long_enough definition to

def is_long_enough(trial_password, minimum=9):
'''
'trial_password' is a string.

The return value should be True for trial_password of
nine characters or more; otherwise it's False.
'''
return len(trial_password) >= minimum


At this point, the test result goes to OK or Green.

A Red-less example

Imagine a slightly different example now: A programmer begins writing
import unittest

def is_long_enough(trial_password, minimum=9):
'''
'trial_password' is a string.

The return value should be True for trial_password of
nine characters or more; otherwise it's False.
'''
return True


class Tests(unittest.TestCase):
def test_length(self):
self.assertTrue(is_long_enough("long-password"))

The programmer is interrupted. After a delay, the programmer returns, runs the test, sees OK and concludes it’s time to move forward.

This example is so simplified that it’s easy to see the error: Return True is wrong, and certainly no substitute for return len(trial_password) >= 9. As obvious as that seems now in isolation, I can testify that mistakes of this sort happen in real-world commercial shops on a daily basis. Even top-notch programmers have a tendency to skip Red until they’ve experienced its consequences a few times for themselves.

A temptation

Even then, starting with Green is a powerful temptation. Part of the point of routines and checklists and habits is to shift the execution from our “executive brain” to muscle memory. When we pause to judge whether Red is truly necessary in a case at hand, we deviate from our routine, and we are more prone to stumble in one step or another.

And the melancholy reality is that Red always remains on the periphery of our routines, because we don’t share Red with our peers. The most experienced, zealous TDD practitioners check in or commit their Green work products, but rarely anything that goes Red. Many organizations practice a culture, in fact, that explicitly expects all check-ins to pass all tests. That means that Red is a transient never seen by other programmers. There’s no record of the source at the time of Red and no review of the Red test or its details. Red lacks accountability.

In the normal TDD programming world, Red depends on the good intentions and even ethics of individual practitioners, unsupported by any systematic review, measurement or peering. That’s a formula for failure.

And failure is the too-frequent result. In my own observation, programmers at all levels, even at the very top, at least occasionally lose their footing in the vicinity of Red.

A remedy

What’s the solution? I’ve found no entirely satisfying one. As I see it, these are the main possibilities:

  • Continue as we are, with Red subordinate to Green and Refactor
  • Relax any CI rules to accommodate commits that are supposed to fail — of course, this complexifies the CI
  • Redefine Red as an inverted test that must pass

The latter looks something like this: a first commit along the lines of

import unittest

def is_long_enough(trial_password, minimum=9):
'''
'trial_password' is a string.

The return value should be True for trial_password of
nine characters or more; otherwise it's False.
'''
return True


class Tests(unittest.TestCase):
def test_length(self):
self.assertFalse(is_long_enough("long-password"))

followed by a correction to

import unittest

def is_long_enough(trial_password, minimum=9):
'''
'trial_password' is a string.

The return value should be True for trial_password of
nine characters or more; otherwise it's False.
'''
return len(trial_password) >= minimum


class Tests(unittest.TestCase):
def test_length(self):
self.assertTrue(is_long_enough("long-password"))

The advantage here is that both check-ins become part of the permanent history of the source. Both are available for inspection. The change set between them has a strong, clear relation to the Red-Green transition.

It also looks like more work, and it’s correspondingly unpopular with developers. While I’ve experimented with various modifications of the unittest definition to support the Red step better, none of them yet have the polish working teams require.

Mind the Red

Nearly universal practice treats Red in a way that makes it easy to skip, or at least shortcut. This damages the overall reliability of TDD. No comprehensive solution exists; the best we can do in the short term is to be aware of the hazard and plan organization-specific workarounds.

To explore the features of Ranorex Studio risk-free download a free 30-day trial today, no credit card required.

About the Author

Cameron Laird is an award-winning software developer and author. Cameron participates in several industry support and standards organizations, including voting membership in the Python Software Foundation. A long-time resident of the Texas Gulf Coast, Cameron's favorite applications are for farm automation.

You might also like these articles