Understanding Git Merge

Carrying on from my earlier article about some ways in which Git is commonly misunderstood — and how I think one should understand Git — I’d like to dive a bit deeper into one of the most important things Git knows how to do: merging.

If Git is often misunderstood, merging is one of the most misunderstood things about it! In this article, I’ll try to clear up some misunderstandings about merging with Git.

The canonical merge

Let’s start with a review of what I said about merging in the earlier article.

Recall that a branch in Git is just a name pointing to a single commit. Other commits that you might think of as “on” a branch are more properly considered as reachable through the parent chain from the commit that the branch name points to.

When you merge with Git, you merge commits. However, the commits in question will almost always have branch names, and it is natural to think of this operation as merging one branch into another.

Most characteristically, then, this is how a merge proceeds:

  1. You are “on” a branch, usually a primary branch of some sort; let’s say it’s master.

  2. You say git merge otherbranch (or whatever the name of some other branch is). This means you are merging otherbranch into master.

  3. Git now creates, out of whole cloth, a completely new commit combining the contributions of both branches. Moreover:

    • You are working on master, so this new commit is created on master — meaning that the master branch name pointer is advanced to point to the new commit.

    • The otherbranch pointer is not advanced.

    • The new commit has a remarkable feature: it has two parents, the master commit you were on before you said merge, and the commit pointed to by otherbranch, in that order.

So, for instance, suppose we are in this situation:

                    otherbranch
                        |
              X <- Y <- Z
             /
  A <- B <- C <- D <- E <- F <- G
                                |
                              master
                                |
                              HEAD

If we now say git merge otherbranch, we get this:

                    otherbranch
                        |
              X <- Y <- Z <--------\
             /                      \
  A <- B <- C <- D <- E <- F <- G <- M
                                     |
                                   master
                                     |
                                   HEAD

In that diagram, M is a newly minted commit, created entirely by Git. And M is now master. And it has two parents! And they have an order:

  • The first parent is G, which was master previously.

  • The second parent is Z, which was otherbranch previously (and still is).

This sort of commit is called a merge commit.

And that’s not all. After the merge, the working tree and the index and the HEAD (the new merge commit) are all synchronized. In other words, Git does not merely make a new commit; it also changes the state of your working tree (and the index). That is why you must be “on” the branch you intend to merge into; Git needs a working tree in which to enact the results of the merge.

Merge logic

Normally, a commit is something that you create, typically by editing files in the working tree, adding them to the index, and then saying git commit. But when you ask it to merge one branch into another, Git itself is generating a commit — the merge commit. Fine, but I have been completely uninformative about how Git does that. I have waved my hands at the matter, merely saying that Git proceeds by “combining the contributions from both branches” to form the merge commit. What on earth do those words actually mean?

To create a merge commit, Git undergoes a standard internal thought process that I call merge logic. The nitty-gritty details can get very nitty and gritty indeed, but in broad outline, here’s what Git does:

  1. A merge starts with Git locating the common commit from which the merging branches most recently diverged. More technically, this is the first commit reachable from both branches. This commit is called the merge base.

  2. Git then calculates two diffs — from the merge base to the first branch, and from the merge base to the second branch.

  3. To form the merge commit, Git applies both diffs to the merge base.

To illustrate, let’s go back to our diagram once again:

                    otherbranch
                        |
              X <- Y <- Z
             /
  A <- B <- C <- D <- E <- F <- G
                                |
                              master
                                |
                              HEAD

You are on master and you said git merge otherbranch. Then:

  1. Git first figures out that the merge base is commit C. (Do you see why?)

  2. Git then calculates the diff from C to G (because G is master) and the diff from C to Z (because Z is otherbranch).

  3. Git then applies both of those diffs to C simultaneously — and commits the result on master. That is the merge commit. (And, as I’ve already said, master now points to the merge commit, which has two parents, G and Z).

If you stand back and consider this logic from a functional standpoint, you can see that Git is doing exactly what you intuitively think a merge should do. In our diagram, Z represents a state of the working tree that has evolved from C, and G represents a different state of the working tree that has evolved from C. The goal of a merge, aside from the topology of the parent chain, is to put the working tree into both of those states. That is exactly what a merge commit does.

What will merging do to my files?

Perhaps merge logic sounds a bit scary. Git is going to come along and alter the state of your working tree. You might like some reassurance that everything is going to be okay afterwards.

One way to anticipate what a merge will do is to try to follow along with Git’s logic.

For example, if you are about to merge, and you would like to know what the merge base would be, you can ask Git! That’s what the git merge-base command is for. If you’re about to merge otherbranch into master, say git merge-base otherbranch master and you will be told the identifier of the commit where Git thinks these branches diverged.

Similarly, if you are about to merge, and you’d like to see the two diffs that would be applied to the merge base, you can ask Git! If you are on master and you are about to merge otherbranch into it, say git diff otherbranch... and git diff ...otherbranch to see the master diff and the otherbranch diff, respectively.

Still, that sort of information is perhaps better suited to a machine than to a human being. What you’d really like to do, I expect, is to see the merge enacted directly on your working tree, and compare that to how things were before you performed the merge.

To do so, you can give the merge command, but add the --no-commit flag to it. Git responds by configuring your working tree and your index as if it were about to make the merge commit, but it doesn’t actually make the commit. Instead, Git pauses in the middle of the merge operation so that you can inspect the merged files in the working tree.

Since no commit was actually performed, you can now compare the “merged” state of the working tree with its previous state by saying git diff HEAD. When you’re done thinking about the situation, you should probably either complete the merge with git commit or else reverse course, aborting the merge with git merge --abort (which also resets everything back to HEAD).

Another possibility is just to go right ahead and perform the merge! After all, nothing in Git is written on tablets of jade. First, take note of where you are. Here’s one way:

% git rev-parse --short @

Git responds by telling you the unique identifier of HEAD. Let’s suppose it’s bf1908d. Now perform the merge, and then say git diff bf1908d to learn what just happened. If you’re happy with it, do nothing. If you’re not, then git reset --hard bf1908d and you’re right back where you were before the merge, with no harm done.

Moving the merge base

Merging has one additional effect that I have not yet mentioned. To see what it is, let’s return to our canonical merge example once again. We start in this situation:

                    otherbranch
                        |
              X <- Y <- Z
             /
  A <- B <- C <- D <- E <- F <- G
                                |
                              master
                                |
                              HEAD

We say git merge otherbranch, and we end up in this situation:

                    otherbranch
                        |
              X <- Y <- Z <--------\
             /                      \
  A <- B <- C <- D <- E <- F <- G <- M
                                     |
                                   master
                                     |
                                   HEAD

Now let’s pause to consider: after the merge, what has happened to the merge base between these two branches? Before the merge, git merge-base master otherbranch told us that C was the merge base. What does git merge-base master otherbranch tell us after the merge? (Look at the diagram and see if you can work out the answer.)

The merge base right after the merge is Z — which is to say, it is the branch that was merged, otherbranch. Do you see why? master is now M, and one of its parents is Z, so clearly Z is reachable from master in just one step. And as for otherbranch, it is Z, so Z is reachable from otherbranch in zero steps!

So merging one branch into another has a secondary effect: it moves the merge base of those two branches. In particular, the new merge base of the two branches is now the second parent of the merge commit.

In many situations, that fact will not matter to you. You don’t care that merging one branch into another has moved their merge base, because you are never going to merge those two branches again.

But when you are trying to envision the topological effects of a merge, it can become very important to understand that merging moves the merge base. There are techniques and patterns for merging that rely on this fact (I’ll give some examples later). This is not a mere subtlety; it is, indeed, a fundamental fact about Git merges, and one which in my opinion is not sufficiently emphasized in most explanations of Git.

When Git needs help

Sometimes Git will discover that its own merge logic is insufficient to form the merge commit. Under these circumstances, Git will pause in the middle of the merge process and ask for your assistance. The situation is too complicated or too ambiguous for a machine to resolve; Git needs help from the brain power of a human being.

In a most unfortunate bit of nomenclature, this situation is referred to as a merge conflict. That one word, for some reason, strikes fear into the hearts of Git users, as if it were a bad thing, to be avoided at all costs, or a difficult thing to deal with. It isn’t. Remember, merging involves Git making a commit. That is the thing that could be dangerous! Merge conflicts are Git’s way of not doing anything dangerous.

A typical case in point is when one of the two diffs from the merge base shows that a certain line or clump of lines was edited one way, and the other diff shows that the same clump of lines was edited a different way.

For instance, suppose a file at the merge base consists of the text "Hello world", and one branch now has "Hello everyone" while the other branch now has "Goodbye world". In this situation, Git is unsure what to do in order to enact both of those changes in a way that will not be harmful. So Git just pauses and lets you explain to it what you want here. If you want to call that a conflict, fine. I call it reasonable caution on Git’s part.

When Git pauses because the same file has been edited in two different ways that Git can’t resolve automatically, it saves the two different states of the file and rewrites the file in the working tree so that you can see what the conflict is. You can resolve the problem with a GUI tool of some sort, but you can also do it in a text editor. You’ll see clumps of lines like this:

<<<<<<< HEAD
Hello everyone
=======
Goodbye world
>>>>>>> otherbranch

As you can see, that’s a crude but effective way of showing you that one branch, the one we are on, has "Hello everyone" at this point, and the other branch, otherbranch, has "Goodbye world".

A cool trick for getting even more information at this point is to say:

% git checkout --conflict diff3 <filepath>

Git then rewrites the conflicted file with even more information, like this:

<<<<<<< ours
Hello everyone
||||||| base
Hello world
=======
Goodbye world
>>>>>>> theirs

That shows you both branches (termed ours and theirs ) along with the merge base, so you can really see what happened here.

So Git has written some markers into your file, describing the merge conflict. Your job at this point is to eliminate everything in that clump of lines that isn’t what you actually want. If you replaced all of those lines with just "Goodbye world", for example, that would be a possible resolution. So don’t be afraid; go ahead, open the file in your favorite text editor and edit it! Use the information given to make the file look the way you want it.

When you’re done editing, save the file and then git add the file. Ultimately, when all conflicted files are fixed and added, say git merge --continue and the merge will be completed.

I could say much more about the actual mechanisms involved in a merge conflict — what Git does behind the scenes when a situation like this arises, and what other sorts of situation can count as a conflict. But I’m not going to. I chiefly just want to impress upon you that a merge conflict is, of itself, not necessarily bad. It can be an indication that you might be merging the wrong things, but in general it is just part of the business of merging, and is not, of itself to be feared or regarded as a fault. At the same time, there are ways to avert or take control of merge conflicts, and I’ll talk about some of them later.

Modifying Git’s merge logic

There are a lot of little tricks you can perform as part of a merge, to tell Git to supersede its own default merge logic with some other way of thinking about what’s going on. I’m not going to tell you about all of them, but here are three simple modifications that can come in very handy:

  • git merge -X ours: If there are any merge conflicts, don’t pause to ask for help; just let our side of the merge win (the branch we are on now).

  • git merge -X theirs: If there are any merge conflicts, don’t pause to ask for help; just let the other side of the merge win (the branch we are merging into this one).

  • git merge -s ours: Don’t use any merge logic at all! Make a merge commit with the two branches as its parents, but let its content be the commit that is already configured in the index; ignore any possible contribution from the other branch.

When no merge commit is needed

Consider a situation like this:

                    otherbranch
                        |
              X <- Y <- Z
             /
  A <- B <- C 
            |
          master
            |
          HEAD

If we are on master and we ask to merge otherbranch, there are two possible ways to proceed. One is to do the same thing we are already familiar with: Git forms a merge commit and appends it to master:

                    otherbranch
                        |
              X <- Y <- Z
             /           \
  A <- B <- C <-----------M 
                          |
                       master
                          |
                        HEAD

But there’s another way. Git can say to itself: Look here, we don’t really need a merge commit at all. While otherbranch was being updated one commit at a time, master just sat there like a bump on a log. It hasn’t moved since otherbranch diverged from it. In a very real sense, therefore, we may say that otherbranch actually never diverged from master; all that really happened is that master itself was repeatedly updated, under another name.

So when you ask Git to merge otherbranch into master, it just turns that idea of what happened into reality: it simply reunites the branch names! Like this:

                     otherbranch
                         |
A <- B <- C <- X <- Y <- Z   
                         |
                      master
                         |
                       HEAD

That’s called a fast-forward. Observe that, although this happened in response to your saying merge, it is not a true merge. The only thing Git actually did was to reset the master branch name to point to the same commit that otherbranch is pointing to. In effect, it slid master up to otherbranch, and that’s all.

Whether Git will actually perform a fast-forward when possible depends on your configurations (git config). In particular, you can set your merge.ff config to false to prevent, by default, automatic fast-forwarding of the sort that I’ve just described.

There also options that you can add directly to your merge command, and these take precedence over your configurations. The two most significant are:

  • The --no-ff flag. It prevents fast-forwarding for this merge; it forces a canonical merge with the construction of a merge commit.

  • The --ff-only flag. It requires fast-forwarding for this merge; if the merge cannot be expressed as a fast-forward, nothing happens at all.

When I’m growing a project with feature branches and a main branch, I generally do not want a fast-forward to happen. I want the topology and history that a merge commit provides. So I prevent fast-forwarding by adding the --no-ff flag to my merge command.

On the other hand, there are common situations where fast-forwarding is generally a good thing; here are some of them.

Merging the remote tracking branch

When you are sharing a branch and you have the local version of that branch checked out, and you merge the corresponding remote-tracking branch into it, it might be nice to fast-forward where possible rather than making an unnecessary merge commit. After a git fetch, you can call git status to find out whether merging the remote-tracking branch would be a fast-forward:

% git fetch
% git status
On branch master
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
% git merge --ff-only

Perhaps you noticed that, in the last line, I didn’t specify the name of the branch to be merged into master, namely origin/master. Merging your current local branch’s corresponding remote-tracking branch into the current branch is such a common thing to do that it is the default! If you are on master and it is tracking origin/master, the mere phrase git merge is taken to mean git merge origin/master (which can also be expressed as git merge @{u}).

Those with some experience of Git are perhaps now thinking: “Why didn’t you just say git pull?” Well done! By default, git pull is effectively git fetch followed by git merge. So perhaps I could have said that. However, I’m not in favor of saying git pull; it has too many configuration-based variants, so you might not be entirely sure what will happen when you say it. It’s better to say git fetch and then proceed “manually” as I have demonstrated here.

Updating without switching

Here’s a common situation. You’re working feverishly on a feature branch; let’s call it feature. Meanwhile, your colleagues are working on their own feature branches and merging them into master. Your own local master is thus falling further and further behind. You’d like to bring it up to date from time to time. What do you do? Probably something like this:

% git commit
% git switch master
% git fetch
% git merge
% git switch feature

All that switching! Plus, in order to switch, you have to make sure that feature is clean; that’s why you commit first. What you’d prefer to do, I expect, is somehow update master from origin/master without switching to it. Well, you can — provided the update would be a fast-forward. The notation is rather curious:

% git fetch origin master:master

That’s all! You didn’t need to switch away from feature, so you didn’t need to commit your current work on feature either. The fetch was followed by a fast-forward of the remote-tracking branch origin/master into master, entirely behind the scenes.

Pushing

When you push to a remote, what actually happens is a fast-forward. In fact, if pushing would require a true merge (because there are independent new commits on both the local branch and the remote branch that you are pushing to), Git will refuse to permit the push! In my earlier article, I characterized this by saying that push is picky.

The rule here is that Git isn’t going to make a merge commit out of whole cloth on the remote repo. A push must be a fast-forward or it will fail; a true merge at the remote repository is impossible.

The usual way to encounter this limitation is that you try to push and Git balks:

Updates were rejected because the remote contains 
work that you do not have locally.

Typically, this happened because someone else pushed to the same branch, or someone created a commit directly on the remote (for example, adding a README file using the GitHub web interface).

You’ll generally respond by fetching and merging locally, thus incorporating into your local branch all the commit(s) from the remote branch; then you’ll try the push again, and this time you’ll probably succeed.

Pull requests

I have just said that a true merge in a remote repository is impossible. Well, I lied. It’s impossible as far as Git is concerned. But many Git repository hosting services have worked out a way to do it, basically by reaching into the Git repository internals and fiddling with them. This feature is usually called a pull request, though that term is mostly historical; this is actually a merge request.

Pull requests are implemented differently by each Git repository hosting service that provides them. Here, I’ll concentrate on GitHub. What happens, in broad terms, is this:

  1. You push a new branch to a GitHub-hosted repository.

  2. You then switch to GitHub’s web interface and create a pull request, which is really a merge request, asking to merge that branch into some primary branch (which you specify as part of the creation of the pull request).

  3. The actual merge is then postponed so that your collaborators can examine your code, thinking about what would happen to your project if this branch were in fact merged as requested, as well as discussing your code and so forth.

  4. At the end of the process, if the request is approved, someone clicks the button in the web interface approving the merge, and the merge is actually performed — at GitHub, which also closes the pull request.

The important thing to understand is that this is not a Git feature. GitHub is performing some elaborate hanky-panky with the repository, using Git in a way that was never intended. Pull requests are clever, but you’re in GitHub’s world, doing things GitHub’s way.

Nevertheless, GitHub does try to interface in a sensible way with the “normal” world of Git.

For example, after you submit a pull request, and while your pull request is still on hold, you might want to modify the code on the pull request branch. To do so, you just add and commit and push your local copy of the pull request branch, and when you push, GitHub understands that this branch is currently being held for a pull request, and appends your new commit(s) to the pull request.

Also, it’s perfect reasonable for you, as part of the pull request review process, to want to fetch a pull request branch to your local repository and test the code locally. Well, you know the pull request branch name by looking at the GitHub web interface, so just go right ahead and fetch it and check it out as a local branch.

Moreover, you can actually perform locally the merge that a pull request is requesting. That’s a useful thing to be able to do, especially if a merge conflict develops. GitHub does provide a decent merge conflict resolution interface, but perhaps you might like to work out the details of the merge conflict resolution locally.

Let’s say the pull request has been approved for merging, but GitHub is telling you there is a merge conflict. Fetch the pull request branch, check out the target primary branch, merge the pull request branch into the target primary branch, resolve the conflict, and when you’re satisfied, push. GitHub understands that what you have just pushed incorporates the very same merge that the pull request was intended to perform, and responds by closing the pull request.

Other uses of merge logic

Several other Git features become readily explicable when you realize that they, too, are applications of merge logic. For example:

  • When you cherry-pick a commit (git cherry-pick <SHA>), you’re asking Git to create a commit on the branch where you are now. This commit is formed using merge logic! The “branches” in this case are the current HEAD and the commit you are cherry-picking; the merge base is the parent of the commit you are cherry-picking.

  • When you revert a commit (git revert <SHA>), you’re asking Git to perform a kind of backwards-facing cherry-pick. Again, Git forms a commit using merge logic! The “branches” here are the current HEAD and the parent of the commit you are reverting, and the merge base is the commit you are reverting.

  • When you rebase a chain of commits, you’re asking Git to perform a sequence of cherry-picks; therefore, each commit is formed using merge logic.

Because cherry-picking, reverting, and rebasing all use merge logic to construct new commits, they have some similar options. For example, rebasing can fast-forward unless you prevent it. Also, these commands are subject to the same pitfalls as merging. In particular, you can get a merge conflict! That can be quite surprising if you’re not prepared for it.

Curiously, with rebasing, the notions ours and theirs (distinguishing the two “branches” involved in a merge conflict) may appear to be backwards. The reason has to do with the directionality of the verb. When you merge, you’re on ours and you ask for theirs to be merged in. When you rebase, you’re on theirs and you ask to rebase onto ours. That may seem odd, but it makes sense from a merging point of view: in both cases, ours is the recipient of the new commit(s).

Reverse merges

In the simplest of worlds, life might go like this:

  1. Make a branch from the main trunk.
  2. Work on the branch.
  3. Merge the branch back to the main trunk.
  4. Delete the branch.
  5. Repeat starting at 1.

In reality, however, life is usually not that simple. More than one person might be working on this project simultaneously, each person using a different branch, and so the order of branch-making and branch-merging can become interleaved:

  1. Alice makes a branch from the main trunk. Bob makes a branch from the main trunk.
  2. Alice works on Alice’s branch. Bob works on Bob’s branch.
  3. Alice merges Alice’s branch back to the main trunk.
  4. Bob tries to merge Bob’s branch back to the main trunk.
  5. Merge conflict.

It’s easy to see how that can happen. Here’s a simple case in point:

* 2a0bda1 (bob) bob edited b
| * c1e1136 (HEAD -> main) Merge branch 'alice'
|/| 
| * b47d75b (alice) alice edited b
|/  
* ed0d7b0 b
* 1e690a2 a

Alice and Bob both edited the file b. Those edits, as it happens, were in the same line. Alice merged first, so when Bob tried to merge, there was a merge conflict. That sort of situation is just the price of having multiple people contribute to a main trunk branch by merging. There is a race condition here.

As I’ve already said, merge conflicts are not necessarily horrible. Still, it would be nice to reduce the probability of a merge conflict when you are merging into the main trunk. One way to do that is to use a reverse merge, where you deliberately merge in the “wrong” direction before the real merge. In this case, Bob would merge the main trunk into the feature branch just before merging the feature branch into the main trunk. If there are conflicts, Bob can resolve them as part of the reverse merge.

This is how Bob might proceed:

*   88135f1 (HEAD -> main) Merge branch 'bob'
|\  
| *   8778702 (bob) reverse merge, resolve conflicts
| |\  
| |/  
|/|   
* |   c1e1136 Merge branch 'alice'
|\ \  
| * | b47d75b (alice) alice edited b
|/ /  
| * 2a0bda1 bob edited b
|/  
* ed0d7b0 b
* 1e690a2 a

What happened here is that Bob did a reverse merge of main into bob (resolving any conflicts that came up) and then turned right around and did a normal merge of bob into main. The reverse merge has two purposes:

  • The reverse merge is an opportunity for Bob to take care of any merge conflicts before performing the normal merge.

  • The reverse merge moves the merge base, as I described earlier.

Why is the moving of the merge base significant? Because of what happens next. By doing a reverse merge first, Bob has changed the topology of the normal merge that follows it. When Bob now merges bob into main, the merge base has been moved up to c1e1136, the merge commit created by Alice (thanks to the reverse merge).

And what has happened on main since that merge base? In our diagram, nothing. But Bob’s bob has changed b. So when Bob merges bob into main, there’s no conflict, because main makes no contribution to the merge commit with regard to the file b; the state of the file b in bob simply becomes the state of the file b in the merge commit.

Another nice thing about a reverse merge is that it gives you a chance to test the outcome of the real merge before making the real merge. When Bob merges main into bob, Bob ends up with bob in the very same state that main will be in when Bob merges bob into main. It’s nicer for Bob to test the code and discover any issues before merging bob into main, rather than discovering after the fact that the merge of bob into main has somehow broken the code.

Because of all this, many developers routinely do a reverse merge just before doing a normal forward merge. This technique works equally well if the forward merge is going to be performed through a pull request.

Repeatedly merging a long-lived branch

Let’s say you create a feature branch in order to experiment and develop, and later, when things have ended up satisfactorily, you merge your work into a main trunk branch. It’s quite common at that point to delete the feature branch. It has served its purpose, and the branch itself is nothing but a name that is no longer needed for anything. The commits reachable from (and preserved by) the feature branch are now reachable from (and preserved by) the merge commit; the topology of the merge commit, with its two parents, clearly shows what happened.

Let’s describe a temporary feature branch of that kind as short-lived.

Well, there can also be long-lived branches. The main trunk branch itself is long-lived, of course; but a not-uncommon architecture, in some situations, is to have multiple long-lived branches, one of which is repeatedly merged into the other. For example, there might be a long-lived development branch that is occasionally merged into the long-lived master branch (perhaps as preparation for a release).

There’s nothing wrong with that, in and of itself; but one needs to keep in mind that it is an architecture that is particularly prone to the generation of merge conflicts. To illustrate, I’ll describe an actual pattern that I found myself involved in when I was working on a project with a team.

The situation was just I’ve already described. We had two long-lived branches, master and develop. All feature branches were created from develop and merged into develop. Every once in a while, though, it was time for a release. We would make a temporary branch from develop, increment the version number of our project, and merge the temporary branch into master. And when we did that, we got a merge conflict on the version number. Every. Single. Time.

To demonstrate, here is an actual scenario you can enact to reproduce the kind of thing we were doing. I’ll use a file called version.txt to act as the carrier of my version number information; it will be the cause of the repeated merge conflict. Lines starting with # are comments and instructions; lines starting % are literal commands for you to give; lines with no prefix are what Git said to me when I enacted the scenario myself.

We start with an empty repository on an unborn master branch:

% git init
% echo 0 > version.txt
% git add version.txt
% git commit -m'version is 0'

We proceed to develop on develop and merge through a temp branch to master:

% git switch -c develop
# ...do work-add-and-commit, work-add-and-commit...
% git switch -c temp1
% echo 1 > version.txt
% git add version.txt
% git commit -m'up version to 1'
% git switch master
% git merge --no-ff temp1

So far, so good; so we do it again:

% git switch develop
# ...do work-add-and-commit, work-add-and-commit...
% git switch -c temp2
% echo 2 > version.txt
% git add version.txt
% git commit -m'up version to 2'
% git switch master
% git merge --no-ff temp2
CONFLICT (content): Merge conflict in version.txt
# resolve the conflict and finish the merge
[master a877dbe] Merge branch 'temp2' resolving conflict

We got a merge conflict! But we resolved it so that the contribution from temp2 takes effect. The version number has now been incremented to 2. Okay, here we go again:

% git switch develop
# ...do work-add-and-commit, work-add-and-commit...
% git switch -c temp3
% echo 3 > version.txt
% git add version.txt
% git commit -m'up version to 3'
% git switch master
Switched to branch 'master'
% git merge --no-ff temp3
CONFLICT (content): Merge conflict in version.txt
# resolve the conflict and finish the merge
[master ee575b2] Merge branch 'temp3' resolving conflict

And the same thing happened again! Why does upping the version number in the temp branch cause a merge conflict with master every single time? If you look at a graph of what I’ve been doing, you can see why:

*   ee575b2 (HEAD -> master) Merge branch 'temp3' resolving conflict
|\  
| * 119c7fc (temp3) up version to 3
| * a1a1a43 (develop) still more work on develop
* |   a877dbe Merge branch 'temp2' resolving conflict
|\ \  
| * | a63a844 (temp2) up version to 2
| |/  
| * 1beb1b3 more work on develop
* |   abff3e1 Merge branch 'temp1'
|\ \  
| * | e6028e6 (temp1) up version to 1
| |/  
| * d952e24 work on develop
|/  
* bae2f8f version is 0

Consider just the final merge (ee575b2). What was the merge base between master and temp3 just before that merge? It was 1beb1b3 ("more work on develop"). And what, therefore, were the contributions on each branch that had to be merged?

  • At the merge base (1beb1b3), version.txt had contained 0.

  • On temp3, version.txt had been changed to 3.

  • On master, version.txt had been changed to 2 — by the previous conflict-resolving merge commit (a877dbe)!

That’s the problem. Every time we made a conflict-resolving merge commit, that itself was a contribution on master, and that contribution conflicted with the next merge to master where we upped the version number.

On the team where we experienced this, we got used to the repeated merge conflict and came to take it for granted. It wasn’t difficult to resolve every time, so it didn’t worry us. That is in fact one possible solution to the problem: stop seeing it as a problem.

There is, however, another approach, namely to forestall the merge conflict in the first place. How? Well, after every merge of a temp branch to master, we could also have done a merge of the same temp branch back to develop.

The result is like a reverse merge: we move the merge base up, so that the new merge base’s version.txt and the master branch’s version.txt are the same. Thus, the next time we up the version number and merge into master, there is no contribution from master and so there is no merge conflict.

For example, if we start with the scenario as it stands and now merge temp3 into develop, the merge base for the next merge from develop to master (by way of another temp branch) will be temp3. The version numbers in temp3 and master at the time of the next merge will be the same; that’s the whole point of these temp branches. So the next merge will succeed without a conflict.

Squash merge

If you use the --squash option when you say git merge, something very odd happens. Git configures the merge commit in the index, but doesn’t actually commit it, leaving it to you to perform the actual commit (similar to the --no-commit option I mentioned earlier); and when you do perform the actual commit, the result is a normal commit, not a merge commit.

In other words, a squash merge, although it is made using the normal merge logic, is not a merge. It causes a new commit to appear on the current branch, but without any information about the fact that the differences between this commit and the previous one were developed using a different branch.

At first blush, a squash merge sounds like a cool idea. A branch, such as a feature branch, often has many, many commits, generated as the developer was working and diligently doing an add-and-commit at every opportunity. But the ultimate history, one thinks, surely doesn’t need to know anything about all those details. It should be enough to merge one commit containing the ultimate perfect result. That’s what a squash merge offers.

In addition, some people feel an aversion to the “railroad tracks” that are left behind after a branch is merged. Here’s the sort of thing I mean, as portrayed in a widely used Git GUI application (Sourcetree).

The promise of a squash merge is that it avoids all of that secondary history. Not only are all the commits of a branch squashed into one commit, but also there are no merging “railroad tracks”. A squash commit simply comes into being, as if by magic, on the end of the primary branch. That branch seems to progress in meaningful steps, one brilliant commit at a time, with no record of the multiple faltering stages that led to each of those single commits.

But before you are tempted to use a squash merge, consider the downsides. I happen to think that the railroad tracks are good. Without them, you may be throwing away valuable history. If you delete the branch that was “merged”, then the commits made on that branch, which only that branch name can reach, may be deleted — and with them, all record of how the commits evolved, who made them, and where exactly a particular feature was introduced.

And even if you don’t delete the branch that was “merged” (thus preserving its commits), there may be nothing about the new commit that associates it with that branch; it is not a merge commit, so that branch is not one of its parents (it has only one parent). You might describe in the commit message what happened, in words; but that isn’t going to help you (or Git) actually traverse the history of what really happened.

Squash merges and long-lived branches

An issue that comes up again and again on Stack Overflow is this: “We get merge conflicts every time we merge!” A variation is: “Every time we make a pull request, it contains way too many commits; it contains the whole history of the branch, even though the branch has been merged many times before.”

When people start to talk this way, I often ask: “Are you really merging, or are you using some kind of fake merge, such as a squash merge?” And more often than not, the answer is: “Yes, it’s a squash merge.”

This is yet another case of what can happen when you have multiple long-lived branches. When both branches involved in a squash merge are long-lived, and you squash merge from one of them to the other again and again, things can get messy.

In the case of a pull request, it is all too easy to get sucked into making this mistake. For example, GitHub offers the chance to do a squash merge at the conclusion of a pull request, and people accept that offer without understanding the possible consequences. In the GitHub web interface, when you finish with a pull request and you’re ready to merge it, the Merge button is a drop-down menu. One of the choices is “Squash and merge”. So clearly, one thinks to oneself, this must be a good thing to do, because there is good old GitHub suggesting that I should do it.

Moreover, after you’ve chosen “Squash and merge” once, it sticks, so that from now on, in a different pull request, just clicking that button causes another “Squash and merge”; so having done it once, it’s all too easy to do it repeatedly.

Let’s perform a squash merge and witness the kind of issue that can arise. In particular, we’ll perform two of them in succession from the same branch into the same other branch. We will get a conflict, as you will see. And then we can think about why. Once again, we’re going to enact a little scenario, starting with an empty respository and an unborn master branch.

We start by making a file on master and committing it:

% git init
% echo a > a.txt
% git add .
% git commit -m'start'

Now we create develop and modify that same file:

% git switch -c develop
% echo b > a.txt
% git add .
% git commit -m'changed a to b'

We return to master and do a “squash merge”, and it appears to work fine:

% git switch master
% git merge --squash develop
% git commit -m'a squash commit from develop'

So far so good. Now we do it again. We switch to develop and modify that same file some more:

% git switch develop
% echo c > a.txt
% git add .
% git commit -m'changed b to c'

And we return to master and do a “squash merge” again:

% git switch master
% git merge --squash develop

And we get a merge conflict. Why? Well, here’s the situation we are in (on my machine) just before the second squash merge:

* 770c894 (develop) changed b to c
* f8a853b changed a to b
| * 338def7 (HEAD -> master) a squash commit from develop
|/  
* 8e42e40 start

The trouble here is the merge base. What is it? It’s the very first commit, 8e42e40 ("start"). Why is that? Because a squash merge is not a merge. And therefore, the merge base never moves. As a result, develop (770c894) and the previous squash commit on master (338def7) are both contributions. The develop branch is thus conflicting, in effect, with itself, over and over:

  1. Every time you make a squash merge commit, you are adding to the branch on which the squash merge is made.
  2. Every time you make a commit on the secondary branch, you are adding to that branch.
  3. Meanwhile, the merge base never moves.
  4. Therefore, the next time you try to make a squash merge, the secondary branch and the most recent squash merge can conflict.

That’s like the repeated merge conflicts I was describing earlier, except that it’s even worse, because the merge base never moves at all. That, too, is why the length of each successive pull request’s chain of commits seems to grow and grow, portraying all the history since the stationary merge base.

Are squash merges good for anything?

Repeated squash merges from one long-lived branch to another are an antipattern and should be avoided. But that doesn’t mean that squash merges themselves are completely bad.

When you’re making a feature branch with the intention of merging it into the main branch, your usual intention is to merge the feature branch once into the main branch and then delete it. In that case, a squash merge is perfectly viable if you can live with the resulting loss of history that I talked about earlier. The branch isn’t going to conflict with itself after the squash merge because it doesn’t persist after the squash merge.

Moreover, there may not actually be a loss of history. You might decide not to delete the branch after the squash merge. As long as you remember not to work on this branch again (or merge it again), then the history remains. It isn’t reachable directly from the squash merge commit, but you can state the name of the branch in the squash merge commit’s message, so that the branch and its parent chain are at least findable by a human being.

In fact, if you’re using a remote hosting site with pull requests, and if the squash merge is performed through a pull request, then you can delete the branch name and the history will still persist — as long as the pull request persists. At GitHub, for example, pull requests are never deleted (except under special circumstances requiring explicit intervention by the staff at GitHub). Thus you can delete the branch name with no loss of history.

In fact, GitHub helps you even more in this situation. It’s true that Git loses the historical connection between the branch and the squash merge; but GitHub doesn’t lose it! After a pull request is squash merged and completed, then when you look at the resulting squash merge commit in the list of your repository’s commits in GitHub’s web interface, you’ll see a link to the original pull request. Thus the original history is just one link-click away, even if the branch was deleted after the squash merge.

Conclusion

That completes my survey of what I regard as the most important basic things to know about Git merges. I hope you understand them better than you did, and that you will henceforth merge with courage and wisdom, in full control of what you’re doing.

You Might Also Like…

Of Git and GitHub, Master and Main

Following on from my earlier blog post on understanding (and misunderstanding) Git, let’s dive deeper into some individual Git topics. Today’s article is about the branch names master and main. It’s the story of a change in the policies, at GitHub and within Git itself, about what the default initial branch name should be. This …

Of Git and GitHub, Master and Main Read More »

    Sign Up

    Subscribe to our newsletter for tech tips, analysis, and more.