White-Box Testing in TDD

Dec 9, 2019 | Best Practices, Test Automation Insights

White-box testing

Test-driven development has filled up with esoteric techniques and beliefs. Here is a concrete example of a simplification and correction in testing.

This example is executable Python, but the ideas apply across most computing languages, and at least one of its generalizations is likely to apply in your situation:

While I doubt that this small function precisely quotes anything actually in use, it’s representative of many, many real-world definitions. Plenty of code in use that deserves testing differs from this example by only a single line or two.

  The context is that several long-running processes,
  indexed by user account, have been launched. The
  current function determines whether a result is ready,
  and, if so, what it is for $USER.
  If a result isn’t ready--if the long-running process
  is still running--just return None as a sentinel.
  outcome = done(handle)
  if outcome:
    return outcome[os.environ["USER"]]
  return None


One of the first reactions a tester should have to this tiny definition is that it’s hard to test. More precisely, when different developers, testers or continuous integration (CI) processes exercise a test of check_result, they’re likely to receive different results, because each one involves a different $USER.

A natural response would be to add another argument to check_result(). As is frequently the case, a more general or properly abstract definition turns out to be more testable. The signature of the function then becomes:

check_result(handle, account):

At this point, the function looks more testable, but it’s less useful. The updated definition requires all clients of check_result() to compute and manage account for themselves. While testability sounds desirable, it certainly can’t be at the expense of a more complex implementation.

A better solution is possible. All check_result()’s current clients assume that check_result() manages $USER on their behalf. One possible response is to keep the original definition for check_result() but create test methods that save, alter and restore os.environ. Automatic testing of definitions that bind outside resources — for this implementation’s purposes, the user account is external — frequently resort to such invasiveness. It’s more feasible in a white-box situation, of course, when the implementation is visible to the testers. At the same time, to bind test definitions to implementation details makes the tests themselves more fragile.

An even better solution is possible, though. Consider:

check_result(handle, account=os.environ["USER"]):
    """ The context is that several long-running processes, 
        indexed by user account, have been launched. The
        current function determines whether a result is ready, 
        and, if so, what it is for $USER."""
    outcome = done(handle)
    if outcome:
        return outcome[account]
    return None

Reliance on a default argument allows all the clients of check_result() to continue to invoke it unchanged; for them, check_result() still takes responsibility for computation of account. At the same time, parametrization of check_result() makes testing easier; test definitions can pass in specific values for account that yield better-controlled results.

Some computing languages don’t directly support default arguments; C and Java are examples. From their perspective, this example looks suspiciously tricky. The larger principle — that a more general definition can sometimes by more testable — is so important, though, that it’s worth learning workarounds for these languages.

In Java, for instance, it’s natural to create the effect of default arguments through method overloading. In such a case, the main method would include all parameters, and a signature targeted for defaulting implements the default procedurally:

Result check_result(handle, account) {}
Result check_result(handle) {
    Map<String, String> env = System.getenv();
    return check_result(handle, env.get("USER")); 

Whatever the implementation situation, a savvy tester looks for a way to cooperate with development to create an implementation that is both general enough to be testable and specific enough to be useful.

Solve Your Testing Challenges
Test management tool for QA & development


We introduced a generalization of the check_result definition to make the original definition more testable. That testability pays off in uncovering a gap in the specification.

Recall the context of check_result: A long-running process handle indexes eventually completes, at which point it returns results for several different account instances. What if the process completes, though, but no result is available for account? What is supposed to happen then?

Our current version of check_result tosses a KeyError exception. Is that the right action? From the information available to us, we don’t know. Maybe an exception is needed; maybe the function should “silently” return a None, or even some third alternative.

This is the point at which an experienced tester reports back an ambiguity — that is, that a test has failed on the grounds that the requirements have a gap. Notice that parametrization of account makes it easier to identify that someone needs to answer, what if done() returns an answer with no entry for $USER?

Stylistic Hazard

That’s not all. The current version of check_result is so small that we should be able to understand it completely. That’s certainly a desirable property for an example. Even this little example, though, hides at least one more subtlety.

Focus on the test:

if outcome:

From the materials available to us, we’ve already identified a gap. Without a formal specification of done(), we’re unsure whether check_result precisely matches the requirements it should meet.

It’s not just check_result’s behavior that’s in question, though; at least one more ambiguity about done() is lurking.

This one takes a bit more explanation. Some styles of Python return False as a sentinel, and a reader of the check_result above might well speculate that’s exactly the intent: If the computation hasn’t finished, then done() returns False. If the computation has finished, then done() returns a dictionary of results.

That’s a problem, though. Python style manuals prescribe that a correct expression for testing a value that might be False is indeed

if outcome:

The problem is that this test picks up other values as well. Suppose done() finishes but returns an empty dictionary. In Python, a simple

if outcome:

treats both False and the empty dictionary identically. Is that intended? Without a more detailed specification, we don’t know.

At least one improvement is likely. Whatever the decision about the empty dictionary, it’s slightly better style in Python to use None, rather than False, as a sentinel, and test it with

if outcome is None:

The only value that matches this test is None. Coding this way helps distinguish the sentinel from an empty dictionary.

This illustrates that sometimes — in fact, often! — a valuable result of a testing cycle is not “The software passed” or “Here are specific failures,” but “Gaps in the specifications result in these potential problems.”

An alert and knowledgeable tester generates higher-value results that identify not only errors, but also opportunities to improve the long-term resilience of the implementation. Complete specifications and good style help make software that is simultaneously more readable and more testable.

All-in-one Test Automation

Cross-Technology | Cross-Device | Cross-Platform

Related Posts:

5 Software Quality Metrics That Matter

5 Software Quality Metrics That Matter

Which software quality metrics matter most? That’s the question we all need to ask. If your company is dedicated to developing high-quality software, then you need a definition for what “high-quality” actually looks like. This means understanding different aspects of...

The Ins and Outs of Pairwise Testing

The Ins and Outs of Pairwise Testing

Software testing typically involves taking user requirements and stories to create test cases that provide a desired level of coverage. Many of these tests contain a certain level of redundancy. Traditional testing methods can lead to a lot of wasted time and extend...