Development
Common Pitfalls of BDD
Dan Coleman
Written on December 3, 2020

Introduction
So you’ve decided you want to drive development from behaviors, created with a shared understanding in a shared language among the Three Amigos (product owner/designer, developer, and tester). You will, most likely, immediately start seeing some benefits. But you’ll probably also encounter some difficulties. It’s important to understand that these difficulties are common. They are the process of learning how to do BDD well, the exact definition of which depends on what your specific needs are (in other words, learning how to get the most out of BDD for your organization). Avoiding these difficulties can’t really be “taught” in the same way that you can’t teach riding a bike. You have to get on a bike, start riding, and, most importantly, fall and scrape your knee a few times. Learning by failure is the best kind of learning (at least when the costs of the failure aren’t massive), and it can’t be replicated in a classroom or book.
So then what is my goal in discussing these, in a sense, “unteachable” skills? It is to make sure you are aware that you will encounter the need to develop these skills, and that the fact you’re starting without them is not an indication that BDD isn’t helpful or that you aren’t doing it right. As long as you frame the act of falling and scraping your knee as progress toward becoming a good bike rider, and not as evidence you should give up, you are in the right mindset.
How “How” Is Too “How”?
I cover this problem more thoroughly in a separate post. Here, I want to discuss how this impacts Gherkin. The “rule of thumb” is that your Gherkin should define what your requirements are, not how those requirements will be implemented. What this means exactly depends on the situation.
A scenario may spell out too many low-level details that aren’t essential to the problem, tying it unnecessarily to a specific implementation strategy. These scenarios will “smell” a certain way: a long list of “givens”, terms that mean little to nothing to the product owner, tests that are too “specific” (a pass reliably indicates the feature works, but a fail is unreliable and tends to mean developers tweaked something that doesn’t break the feature). Developers will tend to be happy with these scenarios, but product owners will be unhappy.
On the other hand, a scenario may be too abstract. It will fail to actually specify a behavior. It will instead describe a need or a problem, but not the solution. These scenarios will also “smell” a certain way: the steps, especially the “thens”, don’t say anything about the state of the software system, and instead state something about, for example, the emotional state of the user, or the state of something outside the software system (which may be another software system or something that is not software at all). The most telltale sign of this problem is the unfortunate but frequent utterance by devs/testers: this isn’t testable. If your scenarios are consistently failing to produce automation at all (because the people responsible for writing that automation don’t even know how to approach it), this is a reliable indication that the specifications aren’t specific. These scenarios describe why the product exists, but not what the product actually is! Product owners tend to be happy with these scenarios, but developers will be unhappy.
Another telltale sign is the need to supplement Gherkin with other forms of specifications. This simply means the Gherkin isn’t fully specifying the behavior, so it isn’t being used to its full potential. The reason we use Gherkin is to replace plain English specifications that aren’t clear and detailed enough to define test cases. If you hear developers or testers saying, “the Gherkin is fine, but we need more info for this to be actionable”, that indicates the Gherkin is too abstract.
Behavior-driven development is often conflated with a related, but distinct, practice of behavior-driven requirements discovery. Discovering what the requirements are, exactly, and communicating them clearly and unambiguously to developers are different problems. BDD, in the narrow sense, is focused on the handoff, where clear requirements become the input into the developer workflow. The Gherkin used by developers should not be explaining what the needs of the business are. They should explaining what exactly the software should do, on a level so precise that there is no room for interpretation.
Let’s look at some examples:
GIVEN A User <U>
GIVEN The Login Page <LP>
GIVEN <LP> has Username Text Field <UTF>
GIVEN <U> has a Username <UN>
GIVEN Device User Preferences <UP>
GIVEN <UN> is stored in <UP>
WHEN <LP> appears
THEN <UTF> text is <UN>
This scenario is indicating that if a username is stored on the device in User Preferences, then it should be used to populate the username field when the login page appears. But is it really a business requirement that the username gets read from User Preferences? Presumably the answer is no. In that case, this scenario is coupling itself to implementation details that aren’t essential to the desired behavior. When a scenario is too low-level, the magic question to ask is why. Why are we proposing to read a username from User Preferences? Well, because we want a user who previously logged into the app to not have to enter his username again. So let’s back up and express what we really want to see:
GIVEN Previously logged in user <U>
GIVEN The Login Page <LP>
GIVEN <LP> has Username Text Field <UTF>
GIVEN <U> has a Username <UN>
WHEN <LP> appears
THEN <UTF> text is <UN>
Now we’re expressing what we really want. If there is a user who previously logged in, then that users’s username should appear. How we accomplish this, we’ll leave it to developers, because it doesn’t matter. If they decide it’s easiest to store this in a text field instead of User Preferences, so be it. If all the product owner wants is for the previously logged in user’s username to be prepopulated, then the scenario should reflect that.
This also affects how we test. With the previous Gherkin, we would write some value to User Preferences, then test if the Login page reads from it. But in this Gherkin, we would simulate a previous login, and then simply check if on the next login, the text field gets prepopulated. This is an acceptance test that is independent of the implementation details. Developers could switch from User Preferences to a text file, and the test will still “work”.
Now, remember that the “product owner” isn’t the only stakeholder. Another stakeholder might be the architect or systems engineer. He might have some requirements about how the data gets stored. Another stakeholder is Legal or the Privacy department, and they may require that a username get stored in a secure manner, which means we have to store it in the keychain. In this case, there will be another scenario:
GIVEN Device <D>
GIVEN Previously logged in user <U>
GIVEN <U> is the logged in user of <D>
GIVEN <U> has a Username <UN>
GIVEN User <U2> who is not <U>
WHEN <U2> accesses <D>
THEN <U2> cannot access <UN>
Notice we still aren’t specifying how the data is stored. We’re simply specifying that it must be stored in a way that is secured against different people. And it may turn out that the keychain is the only technology that can satisfy this requirement. But we’re expressing that “the keychain” isn’t the requirement. Security is the requirement.
Now let’s consider this example:
GIVEN User <U>
WHEN <U> opens the app
THEN <U> does not need to enter his username
This scenario may accurately describe the need that is satisfied by prepopulating the username field. This is telling us why it is valuable to do so. But it doesn’t tell us what to do! I’m the developer and I receive this scenario. Do I hide the username text field? Show it by gray it out? Prepopulate it? There are many ways to satisfy this “need.” But it’s not a behavior. I need to know what exact behavior to implement in the software. How do I test this? How can I write an automated test that proves the user didn’t have to enter his username? Tests aren’t “intelligent” enough to do something like that. The magic question in this case is how. How do we relieve the user from having to enter his username?
Why is the question that pushes the scenario up. How is the question that pushes the scenario down.
Unnecessary Givens
The structure of a Gherkin scenario was chosen very deliberately. Software is a collection of behaviors. A “behavior” of a software system is a state change. The software is in State X. Event Y occurs. The software then should (according to a requirement) be in State Z. Gherkin structures requirements according to these three aspects of a state change: the initial state (“given”), the event (“when”), and the expected final state (“then”).
What this means is that the “givens” are a description of the state of the system when the event occurs that affects what the final state should be. You don’t, nor could you, describe the entire state of the system. You describe what is relevant to the state change. Only those parts of the initial state that affect what the final state should be belong in the “givens”.
It’s tempting to add “givens” that you may know will be satisfied, but that aren’t actually relevant to the scenario. In principle, this is easy to spot: “givens” that introduce a term that is never referenced again in the scenario. If you can delete a “given”, and there are no “dangling references” (terms being used that weren’t properly introduced or defined), that’s a guarantee that the “given” is superfluous. What if you know that it does matter? Then you must be missing something about the introduced term later in the scenario. Once you add that missing detail, then the “given” will become necessary. In this way, the process of aggressively pruning the “givens” will help reveal if the scenario itself is missing something.
This isn’t a hard and fast rule. It may be necessary to introduce state in the “given” in order to restrict the scope of a requirement. If there is a requirement to show an alert only if the user has “silent” mode disabled, then the only mention of “silent” mode being disabled will be in a “given”. But this is necessary to indicate there is not a requirement to show an alert when “silent” mode is not disabled. This will hopefully be obvious from reading the scenario after deleting such a “given”: it is “too strong.” The reaction to reading it should be, “well, not necessarily!” or “not all the time!”
Let’s look at an example:
GIVEN A User <U>
GIVEN <U> is logged in
GIVEN The Main Page <MP> is displayed
GIVEN Item X <IX> is on <MP>
GIVEN The Detail Page <DP> for <IX>
WHEN <U> selects <IX>
THEN <DP> is displayed
Notice that the second GIVEN does not introduce any new terms. We can delete it, and not create any dangling references:
GIVEN A User <U>
GIVEN The Main Page <MP> is displayed
GIVEN Item X <IX> is on <MP>
GIVEN The Detail Page <DP> for <IX>
WHEN <U> selects <IX>
THEN <DP> is displayed
Now we can ask ourselves: does it really matter to this scenario if the user is logged in or not? You might answer: “yes it does, because the user can’t get to the main page until he has logged in.” Okay, but if that’s true (and that will be expressed in some other scenario about opening the main page), the user being logged in is already guaranteed by the given about the main page being displayed. We’re actually repeating ourselves by including the given about being logged in. It’s not really important to this scenario at all. After all, presumably the user must be logged in before doing anything except logging in. Do we really want to include a given about being logged in for every scenario?
This means when it comes time to test it, we don’t need to worry about mocking or otherwise setting up a login condition. That makes testing easier.
Scenarios Embedded in the Givens
As I mentioned previously, Gherkin organizes a behavior into its three aspects of initial state, event, and final state. A state change is something that takes place at an instant in time: when the event occurs (hence the keyword “when”). Related to the problem of putting unnecessary “givens” into a scenario is when the “givens” don’t describe a snapshot of the system, but actually describe one or more state changes that we expect to have occurred in the past. The “givens” are not a historical record of what the software did up until the “when”. They are a snapshot of the software’s state at the moment the “when” occurs.
The first thing that makes me suspicious this has occurred is, simply, that there are a lot of “givens.” Once I’ve become suspicious, a trick I use is to read the “givens”, and try to replace “given” as the first word of a step with “when” or “then.” If I can take three “givens”, read them as “given”, “when”, “then”, and it grammatically makes sense, then that tells me there’s a full-blown scenario embedded in the “givens.”
The appropriate action is to factor it out. Just move that “given”-“when”-“then” sequence out into its own scenario, save it, and in the current scenario, remove everything except the last line, that contains the final state of that factored out scenario.
For example:
GIVEN A User <U>
GIVEN <U> has a username <UN>
GIVEN <U> has a password <P>
GIVEN <U> entered <UN> into the Username Text Field
GIVEN <U> entered <P> into the Password Text Field
GIVEN <U> selects Log In
GIVEN The Main Page <MP> is displayed
GIVEN Item X <IX> is on <MP>
GIVEN The Detail Page <DP> for <IX>
WHEN <U> selects <IX>
THEN <DP> is displayed
This scenario is really about opening a detail screen for an item on the main page. But it has all this “noise” about logging in. You might say, “but it’s a requirement that the user must log in before getting to the main page.” Sure, but that’s a different requirement. Let’s take this part of the givens:
GIVEN A User <U>
GIVEN <U> has a username <UN>
GIVEN <U> has a password <P>
GIVEN <U> entered <UN> into the Username Text Field
GIVEN <U> entered <P> into the Password Text Field
GIVEN <U> selects Log In
GIVEN The Main Page <MP> is displayed
We can change the second-to-last line to be a WHEN, and the last line to be a THEN:
GIVEN A User <U>
GIVEN <U> has a username <UN>
GIVEN <U> has a password <P>
GIVEN <U> entered <UN> into the Username Text Field
GIVEN <U> entered <P> into the Password Text Field
WHEN <U> selects Log In
THEN The Main Page <MP> is displayed
This is a full-blown scenario embedded into the givens, that specifies the main page gets displayed on successful login. It doesn’t belong in a scenario about opening detail pages. Let’s factor it out, and start with the end result in our scenario:
GIVEN A User <U>
GIVEN The Main Page <MP> is displayed
GIVEN Item X <IX> is on <MP>
GIVEN The Detail Page <DP> for <IX>
WHEN <U> selects <IX>
THEN <DP> is displayed
Now it’s simply a given that we’re on the main page. We’re not worrying in this scenario how we got there. Not only does this help us keep each scenario small and focused on one behavior, it informs us how to test and develop. The test for opening a detail page does not need to go through login. We can design the test to simply start us on the main page, and go from there. That way, the test focuses on that one behavior, and won’t fail because something unrelated like login is broken. Similarly, for developers, this tells us to keep the main page behavior decoupled from how the main page is accessed.
Another thing to look for is too many verbs, especially past tense verbs, in the “givens.” Forbidding verbs (except, of course, for the verb “is”) in the “givens” entirely is probably too restrictive as a formal rule, but it is an effective rule of thumb. If you ever see something like “has completed”, “opened”, or any kind of past tense verb like this in a “given”, you should raise a concern and make sure there’s a good reason why it can’t be restated as a description of the present.
For example, “the menu was opened” can be replaced with “the menu is open”. This may seem like a trivial semantic adjustment, but it’s not. Including the “given” of “the menu was opened” is, to be precise, stating that it is required for this scenario that a menu open event occurred at some time in the past. But what if when the app starts, the menu is already open, and the user (or anything else) has never touched it? In that case, no open events have occurred. The menu simply is, and always has been, open. That’s the difference between “the menu was opened” and “the menu is open.” One requires, to get it perfect, the developer to record a history of menu open events that can be checked later. The other only requires a menu “open/closed” state to be maintained by the software.
Factoring out embedded scenarios like this is very crucial for separating your scenarios, so they can be studied, modified, and (very importantly) tested separately. This is so important for the same reason factoring code is important: it avoids duplication. Imagine every behavior for your main page needing to repeat the login requirement over and over! The need to update several scenarios “together” is another “smell” that indicates they aren’t properly separated, which means a standalone scenario is embedded in the “given.”
Multiple “Whens”
Considering the description I’ve given of what Gherkin means, it may be surprising to know that a scenario with more than one “when” is legal Gherkin at all. Arguably, it would be too restrictive to outlaw such scenarios entirely. There are many cases where multiple “whens” are the “obvious” way to express something. But because of what state changes are, expressing requirements this way is leaving something implicit to the reader. Implicitness is what we want to avoid when specifying requirements and creating a shared understanding.
What do we really mean when we say a state change should take place “when” N events occur? The N events, at least in general, don’t occur simultaneously, so what is the exact point in time the state change should take place? The answer, perhaps obviously, is when the last of the N events occurs. If we were completely explicit, we would have N scenarios, each sharing the same “givens”, plus N – 1 “givens” indicating the other N – 1 events already occurred and the remaining event in the “when.” Allowing multiple “whens” allows us to avoid the obvious redundancy, and express succinctly that for every possible order in which these N events occur, the last of them should trigger this state change.
Let’s look at an example:
GIVEN A User <U>
GIVEN The Login Page <LP>
GIVEN <LP> has Username Text Field <UTF>
GIVEN <LP> has Password Text Field <PTF>
GIVEN <LP> has Accept Terms and Conditions Box <ATCB>
GIVEN <LP> has Login Button <LB>
WHEN <U> enters any text into <UTF>
(AND) WHEN <U> enters any text into <PTF>
(AND) WHEN <U> selects <ATCB>
THEN <LB> is enabled
This scenario is specifying what it takes to enable the login button. Three things have to happen: both text fields must be made nonempty, and the “accept terms and conditions” box must be checked. This is a scenario that “naturally” lends itself to multiple whens. It’s like firing missiles: everyone has to turn their keys. Only when all the keys are turned do the missiles fire.
But really, a state change must take place at one instance. What we’re really saying here is that given two of the requirements have already been met, when the third requirement is met, then the state change should occur. We’ve really specified the following multiple scenarios (to save space I won’t repeat the shared givens every time):
...
GIVEN <UTF> is not empty
GIVEN <PTF> is not empty
WHEN <U> selects <ATCB>
THEN <LB> is enabled
...
GIVEN <PTF> is not empty
<ATCB> is selected
WHEN <UTF> enters any text into <UTF>
THEN <LB> is enabled
...
GIVEN <UTF> is not empty
<ATCB> is selected
WHEN <U> enters any text into <PTF>
THEN <LB> is enabled
The advantage of using multiple “whens” is we can save ourselves the trouble of having multiple scenarios. But… is that really a good thing? Remember, the point of this all is to be explicit and thorough in our specifications, and to make sure everything is testable and tested. The simple matter of fact is each one of these possibilities needs to be tested. By explicitly writing out each scenario, it is making it clear to devs and testers that we need a test case for each scenario, and we need code to trigger the event in each scenario. It makes it more clear what work needs to be done.
Notice, also, that in factoring out these scenarios, something new appeared in the “givens”: GIVEN <UTF> is not empty. There was no mention of “emptiness” in the combined scenario. This happened because by changing a WHEN to a GIVEN, we’re changing an event to a state. The significance of the “enters any text” event is precisely that it changes the “empty” state of the text field. By splitting these out, and calling out this state, we’re making it more explicit that emptiness is what matters. We have named a state that matters to us, and thereby given it semantic significance. The significance of emptiness was buried in implicit details at first.
Splitting out such scenarios forces us to think about, and actually name, the state of the system right before the one event that will trigger a state change occurs. In doing so, we end up being more explicit about the state of the system that we really care about.