Behavior-driven development (BDD) is a software development process that focuses on collaboration. Its primary goal is to help everyone involved negotiate what a software development team should be building. As a result of practicing BDD, you should end up with a set of examples of behavior, expressed in a consistent and readable manner that all parties readily understand and accept. Taken to its next possible step, you can automate these examples of behavior — in other words, you can translate them into tests.
In a sense, the tests you create as part of practicing BDD are a byproduct of the process. We used to call these tests “acceptance tests,” or ATs; if the tests all pass, the customer agrees to accept (buy) the system built by the team. Whatever you call them (and whether or not you automate them), they’re still tests — artifacts that you must maintain if you want them to continue to return benefits on your investment.
You’ll want to ensure you design your acceptance tests with care and a thought toward long-term maintenance. This article proposes a set of appropriate acceptance test design guidelines based on a couple of decades of experience working with numerous software development teams, both pre-BDD and in the era of BDD. To remember these guidelines, just think of the alphabet—or the first seven letters, anyway. These are the ABCs of acceptance test design.
Per Bob Martin, abstraction is the amplification of the essential and elimination of the irrelevant. Applied to acceptance tests, this means a test should describe the steps required to accomplish its goal in a declarative manner, using domain-specific language. You should abstract, or bury, implementation details, such as the specific GUI actions required to accomplish a step.
The principle of abstraction comes directly from a key design goal for programming, but it applies even more to acceptance tests. Tests cluttered with implementation details are brittle, inflexible and not easily understood.
Ensure that your tests describe what happens, not how.
Acceptance tests demonstrate examples of interacting with your system. They are the very first client of your application. Put another way, you will build a second client that you ship to production as long as the first client (the tests) indicates that things are healthy.
The implication: You will have two active clients of the system. Ideally, the test client is a thin layer of glue code that translates from tests written in your tool of choice (Cucumber or FitNesse, typically) into interactions with the system it drives and verifies. Realistically, it’s hard to keep the glue code smooth and thin at all times, particularly with more complex interactions.
I worked with a product owner who was sold on the concept of BDD. The software he specified had been in production for about a year with no major issues. During my tenure on the project, however, we somehow shipped a nasty defect. We had not missed a test, nor was there a misunderstanding about what to build. Far worse: We had a test that should have failed, but its glue code was defective. The code inaccurately implemented its emulation of the production client’s behavior, making things seem hunky-dory.
As an almost immediate result, the product owner lost complete faith in our tests. The return on our heavy investment went to zero.
Ensure your tests have fidelity with the production client.
In the context of programming, cohesion is the measure of how strongly elements within a given module relate to one another. In the context of testing, perhaps cohesion is best expressed as an inverse of the number of goals a test demonstrates: A test with one goal is most cohesive, and cohesion decreases as the number of goals demonstrated by the test increases.
Tests with low cohesion are more challenging to understand. They generally demand more reading to find the elements you’re particularly interested in. In contrast, tests with high cohesion are short and to the point. Further, such cohesive tests make it possible to provide a summary description that concisely describe the one example it demonstrates. In a well-designed test suite, the set of test scenario descriptions tells a story about the behaviors of the system as a whole.
A test with low cohesion is also harder to debug because goals occurring later in the test are tightly coupled to the state produced by the earlier test, by definition. (See the next principle, Decoupling, for a bit more information.)
When a lengthier, multi-purpose test fails, it becomes harder to identify the step during which the failure occurred.
Tests should demonstrate a single goal.
Coupled tests are a surefire route to pulling out your own hair. Some of the biggest testing nightmares I’ve encountered involved a large suite of coupled tests and a non-obvious failure. Determining just which earlier-running test left the system in a bad state can take hours or longer.
The best tests run in a clean, unpolluted context. They assume nothing exists and create the state they need to support execution. When these tests fail, it is considerably less effort to pinpoint the source of the failure.
Tests should not depend on the results of other tests.
Before the days of acceptance tests and BDD, the tests belonged completely to testers, who used costly tools to manage and execute their tests. Rarely did others see these tests. They provided the value of regression testing and little more.
BDD brought ubiquitous availability to the realm. You now design acceptance tests to be accessible to — and eminently readable by — anyone who wants to know how your system behaves. Tools promote highly expressive forms: FitNesse looks like a spreadsheet, and Gherkin supports plain English. With a minimum of training, testers, stakeholders, product owners and even managers can readily understand any given test. This expressiveness simplifies negotiations between the business and the development team.
One of the best benefits you can attain by practicing BDD is knowing what behaviors exist in your system. The business describes their needs in the form of examples of behavior. The delivery team translates these examples into executable tests. As long as these tests pass — and they generally should — you know that they accurately describe what the system does. In contrast, the typical long-lived system without acceptance tests is a maze of buried requirements. We often waste inordinate amounts of time navigating this maze to answer simple questions about the system’s behavior.
To see the return on this benefit of accurately describing your system, your tests must be highly expressive. Unreadable tests become a liability.
First and foremost, the tests must make immediate sense to folks familiar with the domain. Our best way of vetting this characteristic is to ensure the whole team continues to read the tests as their entry point to understanding the system. All parties (testers, analysts, product owners, managers, programmers, etc.) must also feel empowered to continually review and improve any test for clarity. Mobbing and pairing are great ways to accomplish this goal; after-the-fact reviews and pull requests are not as great, but still better than nothing.
Each test should tell a story. Its name summarizes the goal demonstrated by an example. The story the example tells should stand alone. Readers with common domain knowledge should not have to navigate into the system under test to understand the goal of a test or why it expects the results it asserts.
The acceptance test design concepts of Abstraction, Cohesion, and Decoupling also play into the notion of expressiveness.
Ensure all parties can understand each test as a standalone document of behavior.
Free of Duplication
Redundant steps in tests can increase the cost of maintenance. Additional effort is required when changes must be made in multiple places. It also increases risk, as sometimes we do not find all the places where redundant changes must be made.
The primary tools mentioned earlier support various mechanisms for eliminating unwanted duplication. Fitnesse provides things like setup pages and includes pages for common initialization and sub-content; Gherkin supports things like background pages, scenario example tables, and scenario outlines to help drive down duplication.
Often you can increase the level of abstraction while eliminating duplication by condensing multiple steps into a single step.
Be careful not to reduce expressiveness when removing duplication. Each test must still act as a standalone document that tells a cohesive story.
Minimize duplication across tests.
Well, they’re tests. Red = failing = bad. You don’t want to ship your software. Green = passing = good. Your passing acceptance tests should provide you with high levels of confidence that you can ship your software.
Ensure your tests are always green.
You’ll note that “performance” is not one of these criteria for well-designed tests. Performance is always important, but it is a secondary consideration in the context of BDD. Seek to boost performance only if you do not jeopardize these seven principles.
Stick to the ABCs of acceptance test design, and you’ll help increase the return on the significant investment required to create and maintain your tests.