Monday, November 15, 2010

GIT - Version control done the right way - Part - 1

A lot of people have asked me why the name of my blog is www.gitshah.com, the fact is, I am a big advocate of GIT.

I am not going to bore you with, what GIT can do and what it can't (well I am yet to find out what it can't do).  In this series of two post, I am going to share with you guys, a real life incident that changed lives of a few developers, changed their perspective about version control forever.

I was working in a techie company, in a team of around 120 smart developers (yes all of them working on one project.  And no, we were not building the next generation rocket that will travel faster than the speed of light).

The version control system we were using was Subversion (SVN).  We had an continuous integration server setup.  Which means as soon as someone checks-in any code, build is triggered and all test cases are run.  If the build breaks, people know something is wrong and someone needs to fix it.

Looks like a standard project?  Well, you will see.  Continuous integration was the most important aspect of this project, imagine 120 smart developers churning out code every minute at least for 7 hours a day (man that is a lot of code! I agree).  In this situation one has to be very careful about what goes into the central SVN repository.  This is important because, we need the build to remain green as much as possible.

To do that we had a simple rule

NEVER COMMIT ON A BROKEN BUILD, UNLESS YOUR COMMIT WILL FIX THE BUILD.  

Well it sounds fair, no commits on a broken (or red) build.  If you are fixing the build, then of course you are allowed to commit.  Else build would remain broken for eternity!

All this is nice and rosy but there is one problem.  Because the project was so big, full build took around 1 hour (sometimes even more than an hour).  Wow! one hour (What where we building? I think Linux kernel builds must be faster than this) before which someone can find out whether his/her changes are good or not.

Whats wrong with that?

Well consider this scenario.  Deep and Jamie have been working on developing the Feature 1.  When working with SVN they are always in a dilemma whether to commit or not to commit.

Why?

Because, When you check new code in, everybody else gets it.  As soon as someone commits to SVN, its made public.

Hence, they have two choices:
  • Check in half backed, buggy code and drive everyone else crazy
  • Avoid checking it in until the feature is completely developed
Deep and Jamie prefer to take the path - 2.  They decide to not commit any code till the Feature - 1 is developed and code is in a stable state.  Typically, feature development may take 2-4 days.  No commits for 2-4 days!  No version history for 2-4 days!

From the start of feature development till the end, developers cross many milestones or logical points.  At these points, code is relatively stable and does a specific task well.  All tests are passing, but the feature is not yet complete.  All those logical points are really crucial.  But with SVN these logical points are lost in time.

Since they cannot commit to SVN, till the feature is working, they have no choice but to continue coding.  After 3 days they are done with the development.  Now, they have a big change set (code changes worth 3 days) that they want to commit to SVN.  There are two possibilities now:
  • They can commit to SVN, because the build is green (not broken)
  • They cannot commit to SVN since the build is red (broken).
If they can commit at this point then, everything is well and good. 

But, if they cannot commit at this point then, again they have two choices
  • They can wait for the build to go green (and in the mean time can play some game on XBOX or have some noodles)
  • Start development of Feature - 2.  Commit changes set of Feature -1 when the build goes green.
Deep and Jamie, cannot wait for the build to go green.  Build takes an hour to go green.  They can't loose this many billable hours doing nothing.  You see, they are responsible and committed developers.  They move on and start the development of Feature - 2.

Lets say, after another hour build goes green.

Mission - commit Feature - 1 begins! 

Remember they were in a good state before an hour.  All tests were passing and Feature - 1 was complete.  But since then, they have done more coding (remember they had started development of Feature - 2).  They want to commit changes of Feature - 1 but not those of Feature - 2.

What do they do, what do they do?  Time is running out.  They must commit now.  They can't wait any longer to commit Feature - 1.

They decide to selectively check-in Feature 1 files.  They take a call, on whether to check-in a specific file or not based on their programmer instinct.  They hope and pray that, the files they have checked-in, form a logical change set for Feature - 1.  Build is triggered.  While their changes are being built.  Asif and Sandeep check-in their changes.  Remember build takes an hour, everyone is back to churning out more code.

Deep and Jamie's prayers where not answered!  The build breaks.

Build is broken!  Fun starts now!  Count the number of hours before which the build is green again. 

Since Asif and Sandeep had already checked-in (they had checked in when the build was green and it was building some changes),  one more hour is lost .  Naturally, since build was broken before their changes where integrated, this build is on a death march.  Build is in broken state since an hour now.

Other developers have already started cursing, Deep and Jamie for breaking the build.  But wait, Deep and Jamie are thinking their changes can never break the build, their code is flawless!  They had tested the Feature - 1 so many times.  Finally they realize, the files they had checked-in did not form a complete changes set required for Feature - 1 (remember they had selectively checked-in the files).  They forgot to check-in a file which was actually required.  Damn!

It has already been an hour since they last checked-in, they have done some more coding.  To fix the build they have to got back to the original check-in state, check-in the required file and hope that the build goes green.  It all looks very complex to me!

This is a risky situation, developer is not sure about which files to check-in. 

Although, they could have taken a patch of Feature - 1 before starting development for Feature - 2, revert all changes before committing, apply Feature - 1 patch and commit to SVN.  Anyone who has done this even once would agree that this never works smoothly.

The fact is Working on multiple streams of work is not natural in SVN.

After their check-in, build need to run again.  It runs for another hour after which it goes green.  Out of 8 hours of working day build was broken for 2 hours!

Extrapolate this situation with 120 developers.  As the number of developers increase the problem gets bigger and bigger.  Everyone wants to check-in what ever code they have written.  When the build goes green, may be, they are not in state to commit.  They might have crossed the logical point.  End result, no matter however strong your instinct are, you can never be sure that your check-in has all the changes you intended to commit.

Why does this happen? 

With so many developers and such long build cycles, there is a narrow window to check-in.  There where times when, we could not check-in for one week!  Its an unbelievable situation, madness actually, a weeks work that is not checked in, A weeks code without any versioning.  Since we cannot commit to SVN without making our changes public, its impossible to maintain the history of a file at every logical point.

No offline or local commits is a single most important problem with SVN. 

What are the alternatives?

Some teams started branching out from trunk.  The plan was, to do the feature development in this new SVN branch and then merge it with trunk at the end of feature development.  They solved one problem, they could commit as often as they wanted, without impacting other developers outside their team.  But this had given birth to a bigger problem.  The problem of Merge.  They still had to merge their code (lying in SVN branch) with the trunk.

Other teams were still working out of trunk.  Hence, their SVN branch is getting outdated every minute.  At the end of a week, when they try to merge the code into trunk, they experienced, what I call, hell on earth.

When we diverge two SVN branches and try to merge them, subversion tries to figure out what has changed and fails.  End result, SVN shows a lot of merge conflicts.  These are not really conflicts but places where SVN failed to figure out what was wrong.

I have seen people doing the merges for two straight days till 0200 hrs, after which they think, they are in a good state to check-in, only to realize that they have broken the build.

Well, you might find it unreal and exaggerated, but this has happened and its a sad situation to be in. 

Enough of SVN bitching,  in the next post I will show how GIT helped us to get around all these problems.
Have some Fun!