Deep Shah's Blog: distributed version control system

Showing posts with label distributed version control system. Show all posts

Monday, January 3, 2011

How to find the real culprit using GIT bisect

Recently, I had a situation in one of my project, a certain feature which was working sometime back, no longer worked on the HEAD revision. I was pretty sure that it was working in the past, at a certain point in time.

Now, I tried for an hour to find what did go wrong because of which its no longer working. But my efforts were in vein.

What I knew for sure was that, at revision HEAD - 20 the feature was working for sure. So many files must have changed between revision HEAD - 20 and HEAD. How to find the real culprit is the question?

To find where was the real problem, or when was the problem introduced the first time, I decided to use a cool feature of GIT called bisect.

What does bisect do? Well its bisects! Thats what it does and you know what, it does this pretty well!

Bisect, uses a form of binary search to find the real culprit, i.e. the first commit from which a given feature was not working. Essentially this is the commit from which the bug was introduced!

Bisect works on a simple principle, all you have to do is

Identify the good commit (i.e. the commit at which the feature was working)
Identify the bad commit (i.e. the commit at which the feature was not working)
Keep bisecting them till you find the real culprit.

For e.g. lets say that on revision HEAD the feature is not working and on revision HEAD - 20 the feature is working. That means the culprit lies between these two revisions.

In this example good commit is revision HEAD - 20, and bad commit is revision HEAD. We start of by bisecting these two commits, GIT checkouts revision HEAD - 10, we quickly test whether the feature is working in this commit or not. If the feature is working in this commit, the culprit lies between HEAD - 10 and HEAD revisions. If the feature is not working in this revision as well the culprit lies between HEAD - 20 and HEAD - 10 revision.

With this new information, our good and bad commits have changed to either revisions HEAD - 10 and HEAD or HEAD - 20 and HEAD - 10 depending on where the culprit lies. We now start bisecting the new [good, bad] commit set.

This process continues till we get one commit, i.e. the real culprit!

Does it sounds like a lot of work? With GIT you do not need to worry, relax and take deep breath! GIT makes it really easy! Lets look how.

How do they do it:

Lets say that the revision HEAD or the BAD revision is 78158f935cc5aa949a3e60068297befe43b26354 and the revision at which the feature was working for sure or the good revision is 8446e9349a8217c7428b6386fa78b1f88565684d. There are 10 other commits in between these commits.

Lets start bisecting the commits to find the real culprit. This is done using the following command.

Now GIT knows that you want to start bisecting the commits to find the guilty commit.

Next, you need to tell GIT about GOOD and the BAD commits. This is done using the following command

This command tell GIT the commit id of the GOOD commit. Likewise GIT needs to know the commit id of the BAD commit

At this point GIT knows the good and the bad commits and it shows output like

Its basically informing us that it has bisected 4 revisions and you are mearly 2 steps away from finding the real culprit!

GIT has checked out the commit 762eec7656a4bb742cb889b686645897bdd984af, which is roughly somewhere in the middle of good and bad commits.

To visualize what it means, type the following command

This command shows you what has exactly happened. It shows you the GOOD and the BAD commits marked in gray color. It also shows you which commit has been checked out, with a tiny yello circle.

At this point you can quickly run your test to find out whether the checked out version is good or bad.

If the feature is working at this commit then, its a good commit
If the feature is working at this commit then, its a bad commit.

Let say in this case, the feature is not working in this commit. Hence its the bad commit.

We need to inform GIT that this commit is a bad commit. This effectively means that the culprit commit lies between GOOD i.e. 8446e9349a8217c7428b6386fa78b1f88565684d and BAD i.e. 762eec7656a4bb742cb889b686645897bdd984af commits. We have eliminated the first 5 commits in 1 iteration.

To inform GIT that this is a bad commit run the following command

Running visualize, will clearly show you that the commits which contain the guilty commit have narrowed down to just 5 commits

Again, repeating the same process of finding whether the feature is working or not in this particular commit. For this case, lets say that the feature is working i.e. this is a GOOD commit. Lets inform GIT that this is a GOOD commit.

Lets read the output a little carefully this time!

It says that there are 0 revisions left to test after this one. What? What does that mean. Lets visualize the current situation.

Hmm, now did you figure out what does the message mean?

Well, we are at an interesting stage now, GIT knows that which is the GOOD commit and which is the BAD commit, but the important thing to notice here is, there is just one commit between the GOOD and the BAD commits. This means that, as soon as we inform GIT, whether the Feature is working in this commit or not, it point us to the guilty commit. Thats why GIT tells us "0 revision left to test after this..."

Lets say in this case the currently checked out commit is the GOOD commit, lets inform this to GIT and see what happens.

Volla! we have it! GIT shows us the "first bad commit" or the culprit commit, since when the feature has been broken.

To view the content of the commit, we can use

I wasted one full hour trying to figure out what has gone wrong, why the hell the feature was not working! And it took GIT, merely 3 may be 5 minutes to pin point the guilty commit!

Imagine a much bigger GOOD to BAD commit range. Lets say there are 100 or more commits between the GOOD and BAD commits, GIT can save you some serious time!

Please note that at this point if you run

it will show you that you are on a temporary branch. To go back to the branch where you started from and to inform GIT that you have finished bisecting, use the following command

Thats all folks! Take my advice, Go GIT it!

Sunday, December 26, 2010

How to tell GIT about a new remote SVN branch

If your GIT over SVN repository is setup up correctly, then you might not need to do anything mentioned in this post. GIT can work with both Standard and Non-Standard SVN repositories.

If you have followed the instructions properly, any new remote SVN branch would be fetched by running the following command

But if your setup is different from the once mentioned in the earlier post then, continue reading ahead.

You are a cool developer/QA/BA (or simply a cool person), who uses GIT over SVN. Someone created a new SVN branch. You want to checkout this new SVN branch and do some commits on it. But wait! When you setup your GIT SVN repository this branch did not exist. Hence, GIT does not know anything about this new remote branch! How can you say,

Hey Mr. GIT, here is a new SVN branch. Could you please start tracking it for me?

How do they do it?

Actually, turns out, its very simple.

Do some edit in the .git/config file
Do a git svn fetch of the new remote branch
Create a local branch pointing to the newly added remote branch
Start hacking the code on the new branch!

The key here is only the first step. Without wasting anymore time lets dive into the solution.

Step - 1:

Open the "config" file in the .git directory in your favorite text editor.

Lets say that the new SVN branch is called "branch1". Its located at http://non-standard-repository.googlecode.com/svn/branches/module1/branch1

Add the following entries to the .git/config file

Almost there!

If you do not feel comfortable editing the .git/config file manually then use the following commands to achieve the same effect

Step - 2:

The setup is done. All we now need to do is execute the following command

This will get the version history from the "branch1" SVN branch into the local GIT repository. By now you should have a remote branch "branch1" created in your git repository

Step - 3:

Crete a local branch from the newly added remote branch is simpel enough

You are now all set to start hacking the code on the local_branch1!

Step - 4:

No explanations required here!

That's it! With GIT there are always multiple ways of solving any problem! Go GIT it!

Sunday, December 19, 2010

How to setup KDiff as the diff tool for GIT

The git diff command does a great job of showing what has changed. But it shows this information on the command prompt. Some people who are addicted to nice and pretty GUI's, might get bogged down because of this.

Do not worry you people, there is a nice GUI based option. KDiff3 is the answer to this problem! This post will show how easily we can integrate the KDiff3 tool with GIT.

KDiff3 had nice and easy GUI. It does its job very well. It might not be the prettiest but its extremely simple and intuitive to use. +1 for KDiff3 from my side!

So how do we integrate KDiff3 with GIT?

How do they do it?

GIT can be integrated easily with any third party diff tools. We will integrate GIT to with KDiff3. The simple steps to follow are

Download and install KDiff3 from here.
GIT needs to know that KDiff3 should be used as the preferred diff/merge tool. For this, we need to make a simple change in the .gitconfig file. This file can be found under your home directory.

Lets look at the second step in more detail. On a windows machine the .gitconfig file is found under C:\Users\<your user name>\ directory. This path can also be referred via the shortcut ~ on the GIT prompt.

Open the .gitconfig file in your favorite text editor

It should look something like this

Add the following lines to the file

The path config property under the mergetool and difftool, should point to the installation path of KDiff3 tool on your machine. The updated .gitconfig file should look somewhat like this

NOTE: please use forward slash "/" as the path separator even on windows machines. Using back slash "\" will not work!

The above config tells GIT to use the KDiff3 tool as the external diff/merge tool.

All set! Lights, Camera, Action!

Lets edit a few files in a GIT repository:

Lets view the difftool in action.

GIT will ask your permission to launch KDiff3 for viewing the test1.txt.

Once you exit the KDiff3 view of test1.txt, it will ask your permission to launch KDiff3 for test2.txt

Hitting enter will launch the KDiff3 again for viewing test2.txt

Basically, GIT will launch the KDiff3 for all the files that have changes since the last commit.

If you feel annoyed about GIT asking your permission for showing the KDiff3 for each changed file, use the following command

This command will launch the KDiff3 for each edited file, without any prompt!

To use KDiff as the merge tool use the following command

Hitting enter will launch the KDiff3 as the merge tool

KDiff3 shows nice GUI to do the merge easily. It shows the original file on the leftmost window called "A" or "Base", local file in the middle called "B" or "Local" and remote file in the rightmost window called "C" or "Remote".

As you can see its pretty trivial to use KDiff3 as the external diff/merge tool with GIT. Have fun with GIT! Go GIT it!

Thursday, December 9, 2010

How to work on multiple streams of work with GIT over SVN

In the last few posts, I have mentioned several times that working on multiple streams of work using GIT is childs play. In this post, lets actually look at the actual steps, to work on multiple streams of work.

Lets say that Deep is working on the next generation Math package. He aims to write the next generation Factorial program (yea yea! I know, how hard could that be). This program would be the highly optimized and best performing Factorial program ever!

He is currently on the master branch i.e. the default branch.

Lets start:

Deep starts of with the simplest solution first. He writes the code to find the factorial of zero. Here is what it looks like.

Deep decides that, he is at a logical point (the point at which all his tests are passing and code is in a good shape)

He has achieved the unthinkable, the factorial of Zero is found! Wow! what a discovery! Lets commit this change to GIT.

He moves on, adds more code to find the factorial value of 1.

Again, a logical point. Time to commit.

Next, he adds code to find the factorial of any given number

Thats quite some progress. Lets commit again.

At this point, Deep thinks, he has developed the ultimate algorithm to find factorial of any number! He decides to push his changes to SVN and make them public (so that other people can appreciate the awesome code!)

Pushing changes to SVN

World has one less problem to solve! Deep decides to take up another challenge, he decides to write the code for Fibonacci series!

He starts off in the simplest possible way. Fibonacci value of numbers less than 2 is 1.

Logical point, need to commit to GIT.

Turning point:

At this point, Deep's boss, comes over and tells Deep, you moron, dumb a**, @#$#, you don't know that, the Factorial of zero is not zero its 1! Fix it! And fix it now! Deep is all terrified! He has to fix this goof up as soon as possible! After the initial hysteria! He says to him self Aal izz well!

Current git log looks like this:

As you can see there is one commit on the top, which is not yet pushed to SVN. He does not want to push the Fibonacci commit to SVN but still wants to fix the issue. He does not want to lose the Fibonacci change either!

Basically he wants to work on two things, Factorial bug fix and Fibonacci series!

GIT has awesome support for working on multiple streams of work! Let see how GIT enables Deep to deal with this situation.

How do they do it?

He decides to create a bug_fix local branch. This branch will have the code that points to the current code in the SVN repository. This branch will not have the Fibonacci commit. This is how its done

GIT log will show that this branch does not have the Fibonacci commit

This is because we have created the bug_fix branch to point to the current code in SVN.

Next, Deep fixes his goof-up like this

Committing,

Since boss wanted this fix checked-in asap, pushing to SVN

The GIT log will show that we have pushed only the bug fix to SVN

Boss is happy! The goof-up is fixed!

Back to work! Lets continue coding on the Fibonacci series. Remember? That code is still in the master local branch. Hence, to continue working on the Fibonacci series we need to switch to the master branch.

Before moving any further, lets get the latest from SVN

GIT log will show that, we have received the commit that fixed the goof-up, and the Fibonacci commit is placed right on top of it!

End result:

Boss is happy because Deep fixed the goof-up pretty fast
Deep is happy because he did not have to revert, patch or do any kind of circus to work on multiple streams of work!
World is happy because, optimized Factorial program is available :)

Thats all folks! Is this example enough? Do not waste any more time Go GIT it!

Friday, November 26, 2010

How to use GIT with non standard SVN repository layouts

In my previous post I had explained the commands necessary to work with GIT on a standard SVN repository.

But, life is not always that simple! What if you wanted to work with a repository that has non standard svn layout? Super GIT to the rescue!

What do I mean by non standard SVN layouts? Basically, it means that your repository does not follow trunk/, branches/ and tags/ directory structure.

An example of that would be something like this

Note that the brances and trunk of module1 is located under http://non-standard-repository.googlecode.com/svn/branches/module1/ url and tags are located under http://non-standard-repository.googlecode.com/svn/tags/module1/

Lets say that, 90% of the time, I will be working on module1's trunk which is located at http://non-standard-repository.googlecode.com/svn/branches/module1/trunk/

Lets see how we can use GIT over SVN, for such weird SVN repository.

Setting up the GIT repository:

You might be tempted to think that executing the following command should do the trick.

But, anyone who has tried this earlier would know that, this does not work! GIT checks out branch1, branch2 and trunk as separate folders. Thats not what we want. What is the right way to do this?

How do they do it?

Well you need to split the task of cloning the repository in three steps

Do a git init to the root of the repository i.e. http://non-standard-repository.googlecode.com/svn
Do some edits in the config file under .git directory. What edits? Have some patience, we will see this shortly
Do a git svn fetch to get all the trunk and branches

You must have realized by now that the key here is the Step - 2.

Lets see them one after the other.

Step - 1:

Lets first init an empty git repository using the following command

Please note, I am using the root URL of the SVN repository to init the git repository. The URL is only till /svn.

At this point you should have a directory called non-standard-git created. If you go inside this directory you should see the empty git repository i.e. the .git directory created. Step - 1 is done.

Step - 2:

Open up the .git/config file in your favorite text editor. It should look something like this

Change it to look something like this

Note that the fetch, branches and tags properties have been modified. The key here is to use the relative path of module1's trunk, branches and tags in the config file. Step - 2 is done.

Step - 3:

The configuration changes have been made now lets fetch the trunk and branches. Use the following command

After the fetch is done you should be all set! All your branches, tags and trunk will be pointing to the right location. And you are good to go!

Tip: If you do not want to fetch the entire history right from the very beginning, that could be easily done. Lets say you wanted to fetch all the revisions from revision 1000, then use the following command

That's it! We can now work with SVN repositories with non-standard layouts. The support provided by GIT for SVN is nothing but, awesome!

Tuesday, November 23, 2010

How to use GIT with SVN repositories

In the previous series of post, I narrated the real incident that made us move to GIT. By now, its must be obvious to you guys that, GIT is an integral part of any project that we execute from now on.

Most of the client projects we execute use Subversion (SVN) as their central repository. There are various reasons because of which they will not move to pure GIT repositories in near future. But this should not stop us from reaping the benefits of GIT right!

In this post, I am going to demonstrate the basic set of commands required to use GIT with SVN repositories.

Introduce your self to GIT:

Its just good manners to introduce ourselves to people who we are going to work with, isn't it! So lets introduce ourselves to GIT

The above config values tells GIT that,

User name is for all the git repositories on this machine is Deep Shah
Email address for all the git repositories on this machine is deep@gitshah.com

This step is not required, but is considered good practice.

Next, lets bring some color in our dull lives.

These configs will show some colors on the git prompt. Don't execute the above commands, if you hate colors.

Thats about all the configs you need. All set! Lets get some code!

Getting the code:

In this step, we will get the code from SVN and create a local GIT repository. There are two types of SVN repositories.

Repositories with the standard layout. Standard layout repository, is a repository that has trunk, tags and branches folders at its root. As the name suggests, these folders hold the trunk, tags and branches of the repository. Pretty straight forward! Look at this link https://play-with-hg-svn.googlecode.com/svn for an example of standard layout SVN repository
Repositories that do not use standard layout. Non standard repositories could have any weird layout. I have seen repositories having, trunk somewhere nested under the branches! In short layout could be absolutely anything. Have a look at this link http://non-standard-repository.googlecode.com/svn/branches/ for an example of non-standard layout SVN repository.

GIT supports both types of SVN Repositories equally well. But, for the sake of simplicity we will look at standard layout SVN repositories in this post. In the next post we will look at non-standard SVN repositories.

To get the code you will need to clone the repository at https://play-with-hg-svn.googlecode.com/svn. This can be done using the following command.

The above command will create a git repository locally on your machine. The repository will be created under the folder called play-with-hg-svn-git.

If you open that folder you will see, a folder named .git is created. This is the only folder that GIT needs. It does not create any other folder anywhere else.

Comparing this with SVN, SVN keeps the information littered around in countless .svn folders. For each directory/sub-directory in the project there will be a .svn folder. Man that is a lot of .svn folders!

The -s option tells GIT that, the SVN repository follows a standard layout (trunk/, tags/ and branches/).

GIT will download the information from revision 1 to the current or the HEAD revision. The entire revision history is kept on the local machine.

What? What did I just say? The revision information from revision 1 to the current or the HEAD revision will be kept on my local machine? Man! that will take my entire hard disk.

Actually, it wont! GIT keeps all the history in the compressed format. I have tried cloning very big repositories (once that have around 1 GB of content). The entire .git directory (which holds information about all the revisions) was around 350 MB. Wow! That's incredible! The entire history (of a very big repository) is on my local machine and is packed in under 350 MB!

If you still feel that, downloading so much history is just pure waste of bandwidth, don't worry, GIT has an alternative for you as well. For e.g. lets say that, you are going to need the history form revision 20 onwards only, then use the following command

The above command will start downloading history from revision 20 onwards. For your project substitute 20 with any other revision number.

The clone command my take some time to finish for bigger repositories. Be patient. Grab a cup of coffee or play a game on XBox while it finishes.

After clone finishes, you will see that we have got all he code checked out. Got the code, lets start hacking it!

Ignoring files:

All serious projects generate a few binary files. We do not intent to commit those files. Hence, we must ignore them.

To ignore files in a GIT repository we have to create a file called .gitignore directly under your root directory (i.e. in this example under play-with-hg-svn-git). This file accepts ant like path expressions. Any path or file name you put in this file would be ignored and will not be shown while you commit your changes.

Its a good idea is to commit the .gitignore file itself to SVN. This will help other developers who are using GIT on the same repository.

To generate this file automatically using the SVN ignores, use the following command

Remember, when you are working with GIT, you will always be working on a local branch. By default GIT creates a local branch called master. It might feel that this branch has special meaning, but its simply a default name chosen by the GIT folks. It could have been named anything for e.g. servant, batman, robin or absolutely anything else.

At this point, you are all set. The SVN repository has been cloned, code is checked out and ignores have been setup.

Viewing History:

You still do not believe that you have entire history on your local machine, do you? OK, its time for some demonstration! Disconnect from network, turn off your WiFi connection, remove the network cord and type in the following command

This command will show you the graphical representation of all your commits at lightning fast speed. You have no network connection, but still you can view the changes made in the first revision! With SVN this was never possible!

Next, try this command

This command shows the text representation of your commit history in the less format. To navigate to the next/previous pages use the keys "<space>/b" respectively. To quit from this view use the key "q"

Local Commits:

The best thing about GIT is, ability to create cheap local branches. These branches reside on your local machine. This feature helps a great deal. This feature makes it possible for us to work on multiple streams of work. Will explain in detail how this is done in future posts.

To view what branches you have currently use the following command

This command should show you something like the above image.

See the "*" besides the master branch, that means currently you are on the master branch. The branches shown in red color are the remote branches. They correspond to the actual SVN branches. We always work on the local branch and never ever on the remote branch. Currently we are working on the master local branch.

Lets make some code changes.

At this point if you type the following command

This shows some information on the screen which looks cryptic.

Actually its very simple. Things that are in red color (with a "-" prefix) have been removed and things that are in green color (with a "+" prefix) have been added to the first.txt file. As simple as that!

This command shows you the current status of the master branch.

It give some important information to us. First it says that we are on branch master (as if we didn't know). Next, it says changed but not updated. This means, one file first.txt has changed but it has not been staged yet.

GIT has a concept of staging the files before committing. For now just remember that to commit the file first.txt you will have to stage it first. Here's how you could do it

Above command will stage only one file first.txt. To stage all files, use the following command.

You can do the same thing using the git gui command.

Now we need to commit our changes. But wait we have no network connectivity! How can we commit without having any network connectivity?

Offline/local commits are single most important advantage of using GIT.

To commit your changes use the following command

At this point, if you look at the log (git log), you should see a brand new commit at the top with the comment First commit from GIT.

Reverting commits and files:

Strange looking 40 char long alphanumeric strings in your git log, are nothing but git commit ids. Although its 40 char long, each commit can be uniquely identified by using the first 7 or 8 chars.

After full days of hard work you find out, the commit you have made will not work and it might break some functionality. You want to revert that commit. To revert a commit, use the following command

This will open up your favorite text editor. Lets you edit the revert message, shows you what files will be reverted. Save and quit from the editor would revert your changes.

If you now do a git log, you will see two commits on the top. One that you created and one that you reverted.

Reverting a file:

To revert a file that has not been staged, use the following command

Read this command as, checkout the last committed version of the file over your changes

To revert a file that has been staged, you will first have to un-stage it using the following command

Then you can revert the first.txt using the checkout command

Updating from and Committing to SVN:

Time has come! Time has come to commit our changes to SVN. But before that, its always a good practice to take the latest from SVN.

To get the latest changes from SVN use this

This command will take the latest changes from SVN and apply your local commits on top of those changes. This means that, all your local commits will appear on top of the latest changes we received from SVN. This is exactly what we want!

All the commits that we have made so far are only in your local repository. They are not visible to other developers. To push your commits to SVN do this

Each of your local commits, will now be committed to SVN as a separate commit.

There is an obvious advantage with this strategy, you can keep making local commits as often as you want, i.e. at every local point, without effecting other developers. When you are ready with the feature you were developing, make those commits public by pushing local commits to SVN. This enables developers to go back in time and see how a feature evolved, when it was under development. The version history of every logical point in the feature development is maintained! This was not possible when we used only SVN.

That's all folks! This much information is enough for you to get started with GIT. Initially it might feel like too much work, but trust me on this, it pays off!

The advantages are far too many. Do not waste any more time, go GIT it!

Sunday, November 21, 2010

GIT - Version control done the right way - Part - 2

In the previous post, I naratted how life was before GIT came into picture. This post is a continuation of the previous post. In this post, I will narrate how moving to GIT over Subversion (SVN) helped us fix the issues we were facing with pure SVN.

So, A short recap,

Problems with SVN:

Its not easy to work on multiple streams of work. SVN is not build for that.
No offline commits. No private commits (commits that only I can see and make them public when I want). There is no way in which I can commit and not make those changes publicly available.

How did we get around those problems using GIT over SVN:

There was no way that our team could have moved to pure GIT. Other teams working on the same projects where using SVN repository for their commits. We had to work on the SVN repository for sure. Hence, we decided to use GIT over SVN. SVN will still be our central repository. All updates were taken from and commits were made to the same central SVN repository.

GIT has excellent support for SVN repositories. We use the git svn command. Heres how it helped us.

We started of with a small team moving towards using GIT. Out of 120 developers 10 developers started using GIT. In the first week, boy we had a tough time convincing people to give GIT a fare chance.

The process of unlearning SVN is a little difficult. One has to be really diligent and patient at least for a week. After that, trust me on this, you will never want to work with anything else.

Moving on, the situation of the continuous integration server was still the same. Narrow commit windows and long build times.

Lets say, Deep and Jamie are in the middle of developing Feature - 1. They have been committing to their local GIT repository as often as they want. At every logical point (the point at which all their tests are passing and code is in a good shape) they do a local commit. This commit is only visible to them and not to anyone else.

Lets say, the build goes green. They, don't need to hastily push their changes to SVN any more. If their Feature - 1 development is not complete, they don't need to push their changes to SVN. Even without pushing to SVN, they are getting all the benefits of a Version Control system.

For e.g. they made 7 local commits for getting the feature done. This means they have 7 commits in their local GIT repository, but SVN does not know anything about them. They have got the benefit of version control even without making their changes public.

When they decide the feature is read to go public they have the following options (based on the situation of build)

Push their changes to SVN, if build is green
Not push their changes to SVN, if build is red

If they get a chance to push to SVN, they are pretty confident that their changes are in a good state and build will go green with their changes.

If they do not get a chance to commit to SVN, they can continue development of Feature - 2 on a different local branch!

Eventually when the build goes green after n hours (where n > 1), they can quickly switch to the Feature - 1 branch and commit only Feature - 1 changes. In this situation they are confident that only and only Feature - 1 changes will be committed! That is exactly they wanted!

After pushing the Feature - 1 changes to SVN they can switch back to Feature - 2 local branch and then continue development from where they left! Awesome isn't!

Working with multiple streams of work is pretty natural with GIT.

Over and above everything else, they could commit at every logical point. This is really important!

The advantage of doing this is, one could go back in time and have a look at how the file A.java looked at logical point - 2.

When they were using only SVN, this was never possible. They could only see the entire Feature - 1 change set as one big fat commit. They could never go back in time and see how the file A.java had evolved while Feature - 1 was being developed.

We seen so far, only two main advantages of using GIT even with SVN. This is just the tip of the ice burg. GIT is pretty awesome!

But, the flip side is you need to be patient till you unlearn SVN and feel the power of GIT.

Give GIT a fare chance and you will do things you thought were impossible thus far! Do not wait any more, Go GIT it!

Monday, November 15, 2010

GIT - Version control done the right way - Part - 1

A lot of people have asked me why the name of my blog is www.gitshah.com, the fact is, I am a big advocate of GIT.

I am not going to bore you with, what GIT can do and what it can't (well I am yet to find out what it can't do). In this series of two post, I am going to share with you guys, a real life incident that changed lives of a few developers, changed their perspective about version control forever.

I was working in a techie company, in a team of around 120 smart developers (yes all of them working on one project. And no, we were not building the next generation rocket that will travel faster than the speed of light).

The version control system we were using was Subversion (SVN). We had an continuous integration server setup. Which means as soon as someone checks-in any code, build is triggered and all test cases are run. If the build breaks, people know something is wrong and someone needs to fix it.

Looks like a standard project? Well, you will see. Continuous integration was the most important aspect of this project, imagine 120 smart developers churning out code every minute at least for 7 hours a day (man that is a lot of code! I agree). In this situation one has to be very careful about what goes into the central SVN repository. This is important because, we need the build to remain green as much as possible.

To do that we had a simple rule

NEVER COMMIT ON A BROKEN BUILD, UNLESS YOUR COMMIT WILL FIX THE BUILD.

Well it sounds fair, no commits on a broken (or red) build. If you are fixing the build, then of course you are allowed to commit. Else build would remain broken for eternity!

All this is nice and rosy but there is one problem. Because the project was so big, full build took around 1 hour (sometimes even more than an hour). Wow! one hour (What where we building? I think Linux kernel builds must be faster than this) before which someone can find out whether his/her changes are good or not.

Whats wrong with that?

Well consider this scenario. Deep and Jamie have been working on developing the Feature 1. When working with SVN they are always in a dilemma whether to commit or not to commit.

Why?

Because, When you check new code in, everybody else gets it. As soon as someone commits to SVN, its made public.

Hence, they have two choices:

Check in half backed, buggy code and drive everyone else crazy
Avoid checking it in until the feature is completely developed

Deep and Jamie prefer to take the path - 2. They decide to not commit any code till the Feature - 1 is developed and code is in a stable state. Typically, feature development may take 2-4 days. No commits for 2-4 days! No version history for 2-4 days!

From the start of feature development till the end, developers cross many milestones or logical points. At these points, code is relatively stable and does a specific task well. All tests are passing, but the feature is not yet complete. All those logical points are really crucial. But with SVN these logical points are lost in time.

Since they cannot commit to SVN, till the feature is working, they have no choice but to continue coding. After 3 days they are done with the development. Now, they have a big change set (code changes worth 3 days) that they want to commit to SVN. There are two possibilities now:

They can commit to SVN, because the build is green (not broken)
They cannot commit to SVN since the build is red (broken).

If they can commit at this point then, everything is well and good.

But, if they cannot commit at this point then, again they have two choices

They can wait for the build to go green (and in the mean time can play some game on XBOX or have some noodles)
Start development of Feature - 2. Commit changes set of Feature -1 when the build goes green.

Deep and Jamie, cannot wait for the build to go green. Build takes an hour to go green. They can't loose this many billable hours doing nothing. You see, they are responsible and committed developers. They move on and start the development of Feature - 2.

Lets say, after another hour build goes green.

Mission - commit Feature - 1 begins!

Remember they were in a good state before an hour. All tests were passing and Feature - 1 was complete. But since then, they have done more coding (remember they had started development of Feature - 2). They want to commit changes of Feature - 1 but not those of Feature - 2.

What do they do, what do they do? Time is running out. They must commit now. They can't wait any longer to commit Feature - 1.

They decide to selectively check-in Feature 1 files. They take a call, on whether to check-in a specific file or not based on their programmer instinct. They hope and pray that, the files they have checked-in, form a logical change set for Feature - 1. Build is triggered. While their changes are being built. Asif and Sandeep check-in their changes. Remember build takes an hour, everyone is back to churning out more code.

Deep and Jamie's prayers where not answered! The build breaks.

Build is broken! Fun starts now! Count the number of hours before which the build is green again.

Since Asif and Sandeep had already checked-in (they had checked in when the build was green and it was building some changes), one more hour is lost . Naturally, since build was broken before their changes where integrated, this build is on a death march. Build is in broken state since an hour now.

Other developers have already started cursing, Deep and Jamie for breaking the build. But wait, Deep and Jamie are thinking their changes can never break the build, their code is flawless! They had tested the Feature - 1 so many times. Finally they realize, the files they had checked-in did not form a complete changes set required for Feature - 1 (remember they had selectively checked-in the files). They forgot to check-in a file which was actually required. Damn!

It has already been an hour since they last checked-in, they have done some more coding. To fix the build they have to got back to the original check-in state, check-in the required file and hope that the build goes green. It all looks very complex to me!

This is a risky situation, developer is not sure about which files to check-in.

Although, they could have taken a patch of Feature - 1 before starting development for Feature - 2, revert all changes before committing, apply Feature - 1 patch and commit to SVN. Anyone who has done this even once would agree that this never works smoothly.

The fact is Working on multiple streams of work is not natural in SVN.

After their check-in, build need to run again. It runs for another hour after which it goes green. Out of 8 hours of working day build was broken for 2 hours!

Extrapolate this situation with 120 developers. As the number of developers increase the problem gets bigger and bigger. Everyone wants to check-in what ever code they have written. When the build goes green, may be, they are not in state to commit. They might have crossed the logical point. End result, no matter however strong your instinct are, you can never be sure that your check-in has all the changes you intended to commit.

Why does this happen?

With so many developers and such long build cycles, there is a narrow window to check-in. There where times when, we could not check-in for one week! Its an unbelievable situation, madness actually, a weeks work that is not checked in, A weeks code without any versioning. Since we cannot commit to SVN without making our changes public, its impossible to maintain the history of a file at every logical point.

No offline or local commits is a single most important problem with SVN.

What are the alternatives?

Some teams started branching out from trunk. The plan was, to do the feature development in this new SVN branch and then merge it with trunk at the end of feature development. They solved one problem, they could commit as often as they wanted, without impacting other developers outside their team. But this had given birth to a bigger problem. The problem of Merge. They still had to merge their code (lying in SVN branch) with the trunk.

Other teams were still working out of trunk. Hence, their SVN branch is getting outdated every minute. At the end of a week, when they try to merge the code into trunk, they experienced, what I call, hell on earth.

When we diverge two SVN branches and try to merge them, subversion tries to figure out what has changed and fails. End result, SVN shows a lot of merge conflicts. These are not really conflicts but places where SVN failed to figure out what was wrong.

I have seen people doing the merges for two straight days till 0200 hrs, after which they think, they are in a good state to check-in, only to realize that they have broken the build.

Well, you might find it unreal and exaggerated, but this has happened and its a sad situation to be in.

Enough of SVN bitching, in the next post I will show how GIT helped us to get around all these problems.