Monday, January 3, 2011

How to find the real culprit using GIT bisect

Recently, I had a situation in one of my project, a certain feature which was working sometime back, no longer worked on the HEAD revision.  I was pretty sure that it was working in the past, at a certain point in time.

Now, I tried for an hour to find what did go wrong because of which its no longer working.  But my efforts were in vein.

What I knew for sure was that, at revision HEAD - 20 the feature was working for sure.  So many files must have changed between revision HEAD - 20 and HEAD.  How to find the real culprit is the question?

To find where was the real problem, or when was the problem introduced the first time, I decided to use a cool feature of GIT called bisect.

What does bisect do?  Well its bisects! Thats what it does and you know what, it does this pretty well!

Bisect, uses a form of binary search to find the real culprit, i.e. the first commit from which a given feature was not working.  Essentially this is the commit from which the bug was introduced!

Bisect works on a simple principle, all you have to do is
  • Identify the good commit (i.e. the commit at which the feature was working)
  • Identify the bad commit (i.e. the commit at which the feature was not working)
  • Keep bisecting them till you find the real culprit.  
For e.g. lets say that on revision HEAD the feature is not working and on revision HEAD - 20 the feature is working.  That means the culprit lies between these two revisions.

In this example good commit is revision HEAD - 20, and bad commit is revision HEAD.  We start of by bisecting these two commits, GIT checkouts revision HEAD - 10, we quickly test whether the feature is working in this commit or not.  If the feature is working in this commit, the culprit lies between HEAD - 10 and HEAD revisions.  If the feature is not working in this revision as well the culprit lies between HEAD - 20 and HEAD - 10 revision.

With this new information, our good and bad commits have changed to either revisions HEAD - 10 and HEAD or HEAD - 20 and HEAD - 10 depending on where the culprit lies.  We now start bisecting the new [good, bad] commit set.

This process continues till we get one commit, i.e. the real culprit!

Does it sounds like a lot of work?  With GIT you do not need to worry, relax and take deep breath!  GIT makes it really easy!  Lets look how.

How do they do it:

Lets say that the revision HEAD or the BAD revision is 78158f935cc5aa949a3e60068297befe43b26354 and the revision at which the feature was working for sure or the good revision is 8446e9349a8217c7428b6386fa78b1f88565684d.  There are 10 other commits in between these commits.
Lets start bisecting the commits to find the real culprit.  This is done using the following command.

Now GIT knows that you want to start bisecting the commits to find the guilty commit.

Next, you need to tell GIT about GOOD and the BAD commits.  This is done using the following command

This command tell GIT the commit id of the GOOD commit. Likewise GIT needs to know the commit id of the BAD commit

At this point GIT knows the good and the bad commits and it shows output like

Its basically informing us that it has bisected 4 revisions and you are mearly 2 steps away from finding the real culprit!

GIT has checked out the commit 762eec7656a4bb742cb889b686645897bdd984af, which is roughly somewhere in the middle of good and bad commits.

To visualize what it means, type the following command

This command shows you what has exactly happened.  It shows you the GOOD and the BAD commits marked in gray color.  It also shows you which commit has been checked out, with a tiny yello circle.

At this point you can quickly run your test to find out whether the checked out version is good or bad.  
  • If the feature is working at this commit then, its a good commit
  • If the feature is working at this commit then, its a bad commit.  
Let say in this case, the feature is not working in this commit.  Hence its the bad commit.  

We need to inform GIT that this commit is a bad commit.  This effectively means that the culprit commit lies between GOOD i.e. 8446e9349a8217c7428b6386fa78b1f88565684d and BAD i.e. 762eec7656a4bb742cb889b686645897bdd984af commits.  We have eliminated the first 5 commits in 1 iteration.

To inform GIT that this is a bad commit run the following command

Running visualize, will clearly show you that the commits which contain the guilty commit have narrowed down to just 5 commits

Again, repeating the same process of finding whether the feature is working or not in this particular commit.  For this case, lets say that the feature is working i.e. this is a GOOD commit.  Lets inform GIT that this is a GOOD commit.

Lets read the output a little carefully this time!

It says that there are 0 revisions left to test after this one. What? What does that mean. Lets visualize the current situation.

Hmm, now did you figure out what does the message mean?

Well, we are at an interesting stage now, GIT knows that which is the GOOD commit and which is the BAD commit, but the important thing to notice here is, there is just one commit between the GOOD and the BAD commits.  This means that, as soon as we inform GIT, whether the Feature is working in this commit or not, it point us to the guilty commit.  Thats why GIT tells us "0 revision left to test after this..."

Lets say in this case the currently checked out commit is the GOOD commit, lets inform this to GIT and see what happens.

Volla! we have it! GIT shows us the "first bad commit" or the culprit commit, since when the feature has been broken.

To view the content of the commit, we can use

I wasted one full hour trying to figure out what has gone wrong, why the hell the feature was not working! And it took GIT, merely 3 may be 5 minutes to pin point the guilty commit!

Imagine a much bigger GOOD to BAD commit range. Lets say there are 100 or more commits between the GOOD and BAD commits, GIT can save you some serious time!

Please note that at this point if you run

it will show you that you are on a temporary branch. To go back to the branch where you started from and to inform GIT that you have finished bisecting, use the following command

Thats all folks! Take my advice, Go GIT it!
Have some Fun!