Development
BDD as Evolved TDD
Dan Coleman
Written on August 20, 2020

Introduction
I see Behavior-Driven Development (BDD) as an evolution of Test-Driven Development (TDD). If we think about what the purpose, and real benefit, of TDD is, it’s almost impossible not to find ourselves talking about BDD, even if we aren’t calling it that.
TDD means, literally, tests drive software development. What do we mean by “driven”? Well, the most formulaic definition we can give is that the code is roped off and untouchable (maybe it’s an electrified fence) unless — and until — a test fails. If you want to change code, first you have to decide what is going to test your change. Once that test is in place and it fails (because the change isn’t made yet), you now have permission to modify the code. But you only have permission to modify it for the purposes of making the failing test pass. (It’s cheating to go add in some extra stuff while you’re at it.)
There are some differences in opinion over where so-called “refactoring” (changing design without changing…here’s that word…behavior) fits into this. Should refactoring proceed without tests (meaning the tests should be entirely agnostic to “design," and in fact be used to prove that a change is actually a refactor) or in the same fashion of first making a test fail (meaning the tests should be fully aware of and sensitive to “design”)? That’s a separate topic. But at least everyone agrees on changing the code in ways that are “visible” to the outside: not allowed except through the, “write failing test, then make test pass”workflow. What does this give us? Assurance that we’re thinking about what we need from the code before we write it. Hopefully, we’ll write all the code we need, and only the code we need, because we started by defining, in precise detail, what it means for the code to be “working”.
TDD is often confused with one of its implementation details, which is "write tests first." Writing tests first may almost force you into a TDD mindset, but they aren’t equivalent. The real essence of TDD is the inversion of roles between production code and tests. In the standard mindset, the production code determines what tests are needed. In the TDD mindset, the tests determine what production code is needed. But the problem devs so often have with this is that if they are still in the standard mindset, they are stuck thinking of how to write a test for code that doesn’t exist yet.
If they manage to do it, they probably imagined the code in their head before writing the test for it, and then wondered what they gained by making themselves think the code into the virtual IDE in their head instead of typing it out in the actual IDE.
If the tests aren’t testing code (and even if the tests are for interfaces, not implementations, interfaces are still code), then what are they testing?
The answer is behaviors, synonyms for which are requirements or specifications. In this way, we are led naturally into BDD. TDD tells you to drive the code by tests, but it doesn’t tell you what tests those should be. That answer is supplied by BDD: the tests have been there all along, they just need some details filled in. The tests are the specifications, whether they’re called “acceptance criteria” or “the spec document” or whatever.
Requirements Are Tests
The key insight of BDD is recognizing the equivalence of three things:
- The requirement or specification provided by the product designer (or product “owner").
- Whatever condition the developer uses to decide that he is “done” developing.
- The test script the tester follows to confirm the work was completed properly.
The typical workflow is to take a specification from the product side and give it to the developer. The developer runs off with it and usdes it to drive his development work. At the same time, the tester runs off and writes a test script based on the spec, which he will follow when the dev gives him a build to test.
Three versions of the same thing now exist, even if they’re never written down (and are only in someone’s head):
- The product designer’s idea about what the product should behave like.
- Whatever the developer does to check that he has finished the work.
- The test script the tester follows to confirm the work is done.
Unsurprisingly, twhen there is duplication of data, mismatches will tend to arise. The developer’s check that everything is “working” is different from what the product designer will do once she has the finished product in hand, which is itself different from what the tester does to “accept” a build.
The result is a “backward” workflow: The tester may send the build back to the developer with more work to be done, or the developer may go back to the designer and request clarification or more details. Overall, the result is wasted effort and noise in the workflow. It is more efficient to discover and resolve any misunderstandings or missing details before the work begins, than during or after.
When we ponder this — and once we identify that those three things really are the same thing but in different forms — we immediately realize what the solution is. The specification, the driving force of the developer and the test script followed by the tester should all be merged into one piece of information, preventing mismatches. One source of truth.
GIven one thing is used by all three of these parties, when we ensure that it is detailed enough that there is no (within reason) room for multiple interpretations, then we won’t have a developer calling something “working” that a tester or product designer would call “not working," and we won’t have a developer calling something “not enough information to proceed” that the product designer called “all the requirements." The goal is to create a shared understanding, in a shared language, among the product designer, developer and tester: The "Three Amigos."
The Restaurant Analogy
Think of it like a restaurant. A restaurant is primarily composed of two parts: the “front” or made up of the host/hostess and wait staff, and the “back," made up of the chefs, cooks and food preps. The wait staff is analogous to product owners: They are proxies for the paying customer, and communicate customer demands to the kitchen. The cooks are analogous to developers, who take the work items submitted to them and create the food the customer will consume. An executive or head chef typically runs a QA station, comparing finished plates to tickets to make sure the orders are right, and testing the quality of the food. He is the tester. The two sides of the restaurant work asynchronously, coordinating their tasks with one of the most basic tools of asynchronous communication: a queue. A list of orders is submitted by the wait staff as they become ready to work on, and the kitchen staff pull an order off the queue as soon as they are ready to work on one.
It is very important to the smooth and efficient operation of the restaurant that the work items submitted by the dining room to the kitchen contain clear, specific and unambiguous instructions for what to prepare for a customer. If they didn’t, then the kitchen staff would have to come out to the dining room and interrupt the wait staff to ask for clarification or raise concerns. The head chef might interpret the order differently than the cook, and tell the cook to redo the dish after it was already made. And worse yet, a dish might make it out to a customer, it’s not what the customer wants, and the server knows it wasn’t what they wanted.
All of this represents wasted or duplicated time and effort.
How does a restaurant accomplish this mode of clear communication? They create a domain-specific language. It is in English, or whatever natural language the staff and customers speak, but it is a restricted form with more rigid rules, tailored to the restaurant business. The domain-specific language of a particular restaurant is the menu. Each menu item has a name, a description and some standard configurations like “sides." The staff collaborate and work out ahead of time what recipes the customers may order, and what customizations or additional information each menu item needs.
For example, one item is a steak entree. This requires specifying the meat temperature, two sides, and a dressing for the salad. The kitchen staff know when they see, “steak entree, medium, mashed potatoes and corn, ranch,” exactly what to prepare, and the head chef knows exactly what to check for before sending the plate out (so then the cook knows if the head chef is going to reject a dish before handing it to him). The server knows exactly what to write down, which will drive the conversation she has with the customer. This way, the server makes sure to get all the info she needs before anything is communicated to the kitchen. Once everything is known, an order is submitted, and no further communication between the dining room and kitchen is necessary until the finished plate is ready to be brought to the customer.
Note that the menu items do not specify how the kitchen is going to prepare an item, unless that method of preparation is part of the customer experience. For example, a menu item may specify “grilled” chicken, but only because that is part of the customer experience. The menu items will not (unless it is essential to the customer experience) specify the types of pots or pans used, what order to cook things, how long to let the sauce simmer, etc. Those decisions are not about selecting a customer experience, they are about the quality of the customer experience. It is left to the kitchen staff to select the methods that produce the best quality possible for the price.
This kind of hyper-efficiency is crucial for any busy restaurant rapidly serving a large dining room of customers.
And any disruption in the flow could derail an entire service.
The Shared Language of Software Requirements
How do we achieve a workflow with this kind of efficiency in software development?
We want a way for product designers to submit work items to developers that contain everything the developer needs, in unambiguous terms, to complete the work to the satisfaction of the designer, and then be verified by testers. In its most ideal form, no communication is necessary after an item is submitted. The developer works on it and completes it, and only rarely does QA uncover an issue that the dev didn’t uncover during development. The result, in plain “software development” terms, is fewer bugs.
There are two kinds of bugs: missed/unclear requirements (the developer intentionally created behavior he mistakenly believed the product designer wants), and missed test cases (the developer unintentionally created behavior and didn’t discover it during development). What we’re seeking is a workflow that minimizes both. To be clear, a problem with the code becomes a “bug” once it escapes development and becomes visible to parties other than developers — like testers, product designers, or customers. We’re not looking for a magic way for developers to get code perfect the first time they type it into the IDE. We just want all those kinks to be worked out before the developer announces, “I’m done and it works!” Also, we want to avoid the common practice of developers sending builds to testers and asking, “does it work yet?”
Enter BDD
What is the restricted form of English that is well-tailored to specifying the behavior of a software product? The main one is Gherkin.
The basic rule of Gherkin is to organize all the statements of a specification into preconditions (“given”), an event (“when”) and a postcondition or assertion (“then”). Specifications, in ordinary natural language, function as the tool of communication between the Three Amigos. But just as a restaurant would have problems trying to express orders in plain English so, too, do software shops run into issues expressing the specs “loosely." There are two particular problems with plain English specifications or “acceptance criteria”: ambiguity and missing details.
A key insight of BDD is that the missing details necessarily get filled in by the process of creating a test script for the specification. In fact, a specification and a test script are closely related. Both define pass/fail criteria, and both set up the conditions in which that pass/fail criteria apply. A “test” is always implicitly contained in a specification. Whenever I say, “I want X," this implicitly contains, “if I get X, that is good, if I don’t get X, that is bad."
The relationship between a specification and a test script is the same as the relationship between an interface and an implementation. A specification says: this is what to test. A test script says: this is how to test it. In working out how something is going to be tested, missing details must be filled in. There’s no way to have the “how” worked out when the "what" is incomplete. But that means working out the test script will change the specification, by making it actually specific enough. We want to make sure this addition of details happens ahead of time, in collaboration among the Three Amigos, and not later by the developer when he encounters the missing detail, and again by the tester when he encounters it while writing the test script.
From Specification to Behavior
Gherkin is a language designed to express the "what" of a test without having to specify the "how". Let’s look at an example. Say we have an app with a list view of items. We have the specification:
The user should not be allowed to edit items whose status is “completed”
This is a good example of a statement that seems perfectly clear and unambiguous to a product designer. The developer gets it and implements it by having the “edit” button throw up an alert that says, “this item cannot be edited." The tester writes a test script that says: Confirm that the edit button is grayed out and nonfunctional. The tester then fails the build. The developer redoes the code according to the test script. Then they deliver it to the designer, who looks at it and says, “oh, actually, what I was thinking is these rows shouldn’t have “edit” buttons at all." So now the developer has to redo his code again, and the tester has to rewrite his script.
Now, let’s put this in Gherkin, which we call a scenario. The main point of doing so is that it forces us to think about an event that, if the work is completed properly, causes a condition to be met:
**Given**
Item “X” in List has status “completed”
**When**
???
**Then**
User cannot edit Item “X”
By putting it in this format, we can see there are two problems. First, most obviously, there is no “when." So then how do you test it? At which point in the execution of the test script does the tester begin looking for the pass/fail condition? The second problem is that the “then” is vague, in terms of testability. Maybe we can imagine a test script that performs this check, but doing so requires defining what “cannot edit” means, exactly (it doesn’t just define how to check for it, it defines exactly what to check for). As it stands, without a precise definition, it isn’t testable.
The conversation these issues drive among the Three Amigos forces them to realize the specification isn’t specific enough. It’s not precisely telling us what the product should do. Should the event be that an item becomes visible? Is that when the tester looks for the pass/fail condition? Or is it when the user presses the edit button? From this conversation, the product designer says, “oh okay. What I meant is that there should be no edit button for completed items." Okay, let’s update it:
**Given**
Item “X” in List has status “completed”
**When**
Item “X” appears
**Then**
Item “X” does not have edit button
Now, the product designer reads this and says, “yes, this sounds like what I want." The developer says, “that’s clear, as long as you provide a UI comp for a disabled button, I have everything I need." The tester says, “yes, that’s a test script waiting to happen." Now, all of that confusion and back-and-forth that would have happened during development has been pushed left to before development begins. In this form, we now have a clear ticket item to submit to the “kitchen."
From Behavior to Test
Now let’s think what happens when the tester turns this into a test script. Really, what he’s doing is taking each step in this Gherkin, and defining how a tester will handle it. The “givens” and “whens” are each something the tester makes happen. The tester will make it so an item with a “completed” status appears. The “then” is something the tester checks happens. The tester will check that the edit button is absent.
All he needs to do is provide step definitions:
**GIVEN Item “X” in List has status “completed”**
Log in with test account “XYZ”, which has completed items in its list
**WHEN Item “X” appears**
Use main menu to navigate to list view, and scroll to a completed item (an item is completed if the status, printed below the item’s name, says “completed”)
**THEN Item “X” does not have edit button**
Verify the row for item “X” does not contain an “edit” button
Hopefully, the step definitions are fairly obvious to the tester (or at least he sees a clear path to creating the step definitions), which allows him to fill them in mentally as he’s reading the Gherkin. As he practices this, he’ll become more familiar with spotting Gherkin steps that don’t translate into workable step definitions. This is what lets him verify that the scenario is testable.
We could also provide step definitions that a machine would execute, and thereby create an automated test:
**GIVEN Item “X” in List has status “completed”**
let listItem = ListItem()
listItem.status = .completed
**WHEN Item “X” appears**
listItem.didAppear()
**THEN Item “X” does not have edit button**
Assert(listItem.hideEditButton == true)
Notice this isn’t a UI test, in the sense of a black box test that manipulates the app from the outside just like a real user does, by pressing buttons or making gestures, and validating by searching for on-screen elements. We’re punching through the UI, right below it to the layer that drives the UI by defining the logic of what is displayed, not how to display it (if you use a reactive UI design, you already have classes whose job is to expose observable properties that you can be sure make it to the screen whenever their values are updated).
Now, from this, we can see a direct line from the business requirement/specification/behavior, which is ultimately expressed by the product designer, to classes and methods in code. And here, we see that behaviors are driving development. If a ListItem class didn’t already exist, well now we know we need to create it. If that class doesn’t have a didAppear method, well now we know we need to add it. We don’t have to implement the steps this way. Maybe a ListItem is created from another class, say Item, when a row is about to appear on screen. In this case the “when” step is implemented by constructing the ListItem. Either way, we can see that the initial decisions about code design are driven by implementing the “steps” for testing the business requirement, which are none other than simple English statements describing preconditions, an event, and a postcondition.
The test that emerges from implementing the steps of a scenario has a specific name: acceptance test. Now, we can top off TDD with clear instructions on what tests to write: acceptance tests. These are created by implementing the steps of a scenario. The scenario describes, in a structured form of plain English, exactly what the product designer wants.
Clear and Unambiguous
This all describes a blueprint for how to run a software development process with minimal waste, focusing on producing all the code we need, none of the code we don’t need, and ending with clear documentation and testing instructions for the developed product. It is not a silver bullet, and it takes a lot of practice to execute this flow effectively (in the future, we’ll talk about common pitfalls you’ll probably fall into when you first start trying this out). But BDD is a great way to produce clear, unambiguous and complete product specifications as the input to the coding process.
This complements TDD by clearly stating what tests you should write, as the first step to making changes to the code.