Friday, November 26, 2010

How to use GIT with non standard SVN repository layouts

In my previous post I had explained the commands necessary to work with GIT on a standard SVN repository.

But, life is not always that simple!  What if you wanted to work with a repository that has non standard svn layout?  Super GIT to the rescue!

What do I mean by non standard SVN layouts?  Basically, it means that your repository does not follow trunk/, branches/ and tags/ directory structure.

An example of that would be something like this

Note that the brances and trunk of module1 is located under http://non-standard-repository.googlecode.com/svn/branches/module1/ url and tags are located under http://non-standard-repository.googlecode.com/svn/tags/module1/

Lets say that, 90% of the time, I will be working on module1's trunk which is located at http://non-standard-repository.googlecode.com/svn/branches/module1/trunk/

Lets see how we can use GIT over SVN, for such weird SVN repository.

Setting up the GIT repository:

You might be tempted to think that executing the following command should do the trick.

But, anyone who has tried this earlier would know that, this does not work!  GIT checks out branch1, branch2 and trunk as separate folders.  Thats not what we want.  What is the right way to do this?

How do they do it?

Well you need to split the task of cloning the repository in three steps
  1. Do a git init to the root of the repository i.e. http://non-standard-repository.googlecode.com/svn
  2. Do some edits in the config file under .git directory.  What edits?  Have some patience, we will see this shortly
  3. Do a git svn fetch to get all the trunk and branches
You must have realized by now that the key here is the Step - 2.

Lets see them one after the other.

Step - 1:

Lets first init an empty git repository using the following command

Please note, I am using the root URL of the SVN repository to init the git repository.  The URL is only till /svn.

At this point you should have a directory called non-standard-git created.  If you go inside this directory you should see the empty git repository i.e. the .git directory created.  Step - 1 is done.

Step - 2:

Open up the .git/config file in your favorite text editor.  It should look something like this

Change it to look something like this
Note that the fetch, branches and tags properties have been modified.  The key here is to use the relative path of module1's trunk, branches and tags in the config file.  Step - 2 is done.

Step - 3:

The configuration changes have been made now lets fetch the trunk and branches.  Use the following command

After the fetch is done you should be all set!  All your branches, tags and trunk will be pointing to the right location.  And you are good to go!

Tip: If you do not want to fetch the entire history right from the very beginning, that could be easily done.  Lets say you wanted to fetch all the revisions from revision 1000, then use the following command

That's it!  We can now work with SVN repositories with non-standard layouts.  The support provided by GIT for SVN is nothing but, awesome!

Tuesday, November 23, 2010

How to use GIT with SVN repositories

In the previous series of post, I narrated the real incident that made us move to GIT.  By now, its must be obvious to you guys that, GIT is an integral part of any project that we execute from now on.

Most of the client projects we execute use Subversion (SVN) as their central repository.  There are various reasons because of which they will not move to pure GIT repositories in near future.  But this should not stop us from reaping the benefits of GIT right!

In this post, I am going to demonstrate the basic set of commands required to use GIT with SVN repositories.

Introduce your self to GIT:

Its just good manners to introduce ourselves to people who we are going to work with, isn't it!  So lets introduce ourselves to GIT

The above config values tells GIT that,
  • User name is for all the git repositories on this machine is Deep Shah 
  • Email address for all the git repositories on this machine is  deep@gitshah.com
This step is not required, but is considered good practice.

Next, lets bring some color in our dull lives.

These configs will show some colors on the git prompt.  Don't execute the above commands, if you hate colors.

Thats about all the configs you need.  All set!  Lets get some code!

Getting the code:

In this step, we will get the code from SVN and create a local GIT repository.  There are two types of SVN repositories.
  • Repositories with the standard layout.  Standard layout repository, is a repository that has trunk, tags and branches folders at its root.  As the name suggests, these folders hold the trunk, tags and branches of the repository.  Pretty straight forward!  Look at this link https://play-with-hg-svn.googlecode.com/svn for an example of standard layout SVN repository
  • Repositories that do not use standard layout.  Non standard repositories could have any weird layout.  I have seen repositories having, trunk somewhere nested under the branches!  In short layout could be absolutely anything.  Have a look at this link http://non-standard-repository.googlecode.com/svn/branches/ for an example of non-standard layout SVN repository.
GIT supports both types of SVN Repositories equally well.  But, for the sake of simplicity we will look at standard layout SVN repositories in this post.  In the next post we will look at non-standard SVN repositories.

To get the code you will need to clone the repository at https://play-with-hg-svn.googlecode.com/svn.  This can be done using the following command.

The above command will create a git repository locally on your machine.  The repository will be created under the folder called play-with-hg-svn-git.

If you open that folder you will see, a folder named .git is created.  This is the only folder that GIT needs.  It does not create any other folder anywhere else.

Comparing this with SVN, SVN keeps the information littered around in countless .svn folders.  For each directory/sub-directory in the project there will be a .svn folder.  Man that is a lot of .svn folders!

The -s option tells GIT that, the SVN repository follows a standard layout (trunk/, tags/ and branches/).

GIT will download the information from revision 1 to the current or the HEAD revision.  The entire revision history is kept on the local machine.

What? What did I just say?  The revision information from revision 1 to the current or the HEAD revision will be kept on my local machine?  Man! that will take my entire hard disk.  

Actually, it wont!  GIT keeps all the history in the compressed format.  I have tried cloning very big repositories (once that have around 1 GB of content).  The entire .git directory (which holds information about all the revisions) was around 350 MB.  Wow!  That's incredible!  The entire history (of a very big repository) is on my local machine and is packed in under 350 MB!

If you still feel that, downloading so much history is just pure waste of bandwidth, don't worry, GIT has an alternative for you as well.  For e.g. lets say that, you are going to need the history form revision 20 onwards only, then use the following command

The above command will start downloading history from revision 20 onwards.  For your project substitute 20 with any other revision number.

The clone command my take some time to finish for bigger repositories.  Be patient.  Grab a cup of coffee or play a game on XBox while it finishes.


After clone finishes, you will see that we have got all he code checked out.  Got the code, lets start hacking it!

Ignoring files:

All serious projects generate a few binary files.  We do not intent to commit those files.  Hence, we must ignore them.

To ignore files in a GIT repository we have to create a file called .gitignore directly under your root directory (i.e. in this example under play-with-hg-svn-git).  This file accepts ant like path expressions.  Any path or file name you put in this file would be ignored and will not be shown while you commit your changes.


Its a good idea is to commit the .gitignore file itself to SVN.  This will help other developers who are using GIT on the same repository.

To generate this file automatically using the SVN ignores, use the following command

Remember, when you are working with GIT, you will always be working on a local branch.  By default GIT creates a local branch called master.  It might feel that this branch has special meaning, but its simply a default name chosen by the GIT folks.  It could have been named anything for e.g. servant, batman, robin or absolutely anything else.

At this point, you are all set.  The SVN repository has been cloned, code is checked out and ignores have been setup.

Viewing History:

You still do not believe that you have entire history on your local machine, do you?  OK, its time for some demonstration!  Disconnect from network, turn off your WiFi connection, remove the network cord and type in the following command

This command will show you the graphical representation of all your commits at lightning fast speed.  You have no network connection, but still you can view the changes made in the first revision!  With SVN this was never possible!

Next, try this command 

This command shows the text representation of your commit history in the less format.  To navigate to the next/previous pages use the keys "<space>/b" respectively.  To quit from this view use the key "q"

Local Commits:

The best thing about GIT is, ability to create cheap local branches.  These branches reside on your local machine.  This feature helps a great deal.  This feature makes it possible for us to work on multiple streams of work.  Will explain in detail how this is done in future posts.

To view what branches you have currently use the following command

This command should show you something like the above image.


See the "*" besides the master branch, that means currently you are on the master branch.  The branches shown in red color are the remote branches.  They correspond to the actual SVN branches.  We always work on the local branch and never ever on the remote branch.  Currently we are working on the master  local branch.

Lets make some code changes.

At this point if you type the following command

This shows some information on the screen which looks cryptic.

Actually its very simple.  Things that are in red color (with a "-" prefix) have been removed and things that are in green color (with a "+" prefix) have been added to the first.txt file.  As simple as that!

This command shows you the current status of the master branch.



It give some important information to us.  First it says that we are on branch master (as if we didn't know).  Next, it says changed but not updated.  This means, one file first.txt has changed but it has not been staged yet.

GIT has a concept of staging the files before committing.  For now just remember that to commit the file first.txt you will have to stage it first.  Here's how you could do it

Above command will stage only one file first.txt.  To stage all files, use the following command.


You can do the same thing using the git gui command.

Now we need to commit our changes.  But wait we have no network connectivity!  How can we commit without having any network connectivity?

Offline/local commits are single most important advantage of using GIT.

To commit your changes use the following command

At this point, if you look at the log (git log), you should see a brand new commit at the top with the comment First commit from GIT.



Reverting commits and files:

Strange looking 40 char long alphanumeric strings in your git log, are nothing but git commit ids.  Although its 40 char long, each commit can be uniquely identified by using the first 7 or 8 chars.

After full days of hard work you find out, the commit you have made will not work and it might break some functionality.  You want to revert that commit.  To revert a commit, use the following command



This will open up your favorite text editor.  Lets you edit the revert message, shows you what files will be reverted.  Save and quit from the editor would revert your changes.

If you now do a git log, you will see two commits on the top.  One that you created and one that you reverted.

Reverting a file:

To revert a file that has not been staged, use the following command

Read this command as, checkout the last committed version of the file over your changes

To revert a file that has been staged, you will first have to un-stage it using the following command

Then you can revert the first.txt using the checkout command

Updating from and Committing to SVN:

Time has come!  Time has come to commit our changes to SVN.  But before that, its always a good practice to take the latest from SVN.

To get the latest changes from SVN use this

This command will take the latest changes from SVN and apply your local commits on top of those changes.  This means that, all your local commits will appear on top of the latest changes we received from SVN.  This is exactly what we want!

All the commits that we have made so far are only in your local repository.  They are not visible to other developers.  To push your commits to SVN do this

Each of your local commits, will now be committed to SVN as a separate commit.

There is an obvious advantage with this strategy, you can keep making local commits as often as you want, i.e. at every local point, without effecting other developers.  When you are ready with the feature you were developing, make those commits public by pushing local commits to SVN.  This enables developers to go back in time and see how a feature evolved, when it was under development.  The version history of every logical point in the feature development is maintained!  This was not possible when we used only SVN.

That's all folks!  This much information is enough for you to get started with GIT.  Initially it might feel like too much work, but trust me on this, it pays off!

The advantages are far too many.  Do not waste any more time, go GIT it!

Sunday, November 21, 2010

GIT - Version control done the right way - Part - 2

In the previous post, I naratted how life was before GIT came into picture.  This post is a continuation of the previous post.  In this post, I will narrate how moving to GIT over Subversion (SVN) helped us fix the issues we were facing with pure SVN.

So, A short recap,

Problems with SVN:
  1. Its not easy to work on multiple streams of work.  SVN is not build for that.
  2. No offline commits.  No private commits (commits that only I can see and make them public when I want).  There is no way in which I can commit and not make those changes publicly available.

How did we get around those problems using GIT over SVN:

There was no way that our team could have moved to pure GIT.  Other teams working on the same projects where using SVN repository for their commits.  We had to work on the SVN repository for sure.  Hence, we decided to use GIT over SVN.  SVN will still be our central repository.  All updates were taken from and commits were made to the same central SVN repository.

GIT has excellent support for SVN repositories.  We use the git svn command.  Heres how it helped us.

We started of with a small team moving towards using GIT.  Out of 120 developers 10 developers started using GIT.  In the first week, boy we had a tough time convincing people to give GIT a fare chance.  

The process of unlearning SVN is a little difficult.  One has to be really diligent and patient at least for a week.  After that, trust me on this, you will never want to work with anything else.

Moving on, the situation of the continuous integration server was still the same.  Narrow commit windows and long build times.  

Lets say, Deep and Jamie are in the middle of  developing Feature - 1.  They have been committing to their local GIT repository as often as they want.  At every logical point (the point at which all their tests are passing and code is in a good shape) they do a local commit.  This commit is only visible to them and not to anyone else.  

Lets say, the build goes green.  They, don't need to hastily push their changes to SVN any more.  If their Feature - 1 development is not complete, they don't need to push their changes to SVN.  Even without pushing to SVN, they are getting all the benefits of a Version Control system.

For e.g. they made 7 local commits for getting the feature done.  This means they have 7 commits in their local GIT repository, but SVN does not know anything about them.  They have got the benefit of version control even without making their changes public.  

When they decide the feature is read to go public they have the following options (based on the situation of build)
  • Push their changes to SVN, if build is green
  • Not push their changes to SVN, if build is red
If they get a chance to push to SVN, they are pretty confident that their changes are in a good state and build will go green with their changes.
  
If they do not get a chance to commit to SVN, they can continue development of Feature - 2 on a different local branch!

Eventually when the build goes green after n hours (where n > 1), they can quickly switch to the Feature - 1 branch and commit only Feature - 1 changes.  In this situation they are confident that only and only Feature - 1 changes will be committed!  That is exactly they wanted!

After pushing the Feature - 1 changes to SVN they can switch back to Feature - 2 local branch and then continue development from where they left!  Awesome isn't!

Working with multiple streams of work is pretty natural with GIT.

Over and above everything else, they could commit at every logical point.  This is really important!  

The advantage of doing this is, one could go back in time and have a look at how the file A.java looked at logical point - 2.  

When they were using only SVN, this was never possible.  They could only see the entire Feature - 1 change set as one big fat commit.  They could never go back in time and see how the file A.java had evolved while Feature - 1 was being developed.

We seen so far, only two main advantages of using GIT even with SVN.  This is just the tip of the ice burg.  GIT is pretty awesome!  

But, the flip side is you need to be patient till you unlearn SVN and feel the power of GIT.

Give GIT a fare chance and you will do things you thought were impossible thus far!  Do not wait any more, Go GIT it!

Monday, November 15, 2010

GIT - Version control done the right way - Part - 1

A lot of people have asked me why the name of my blog is www.gitshah.com, the fact is, I am a big advocate of GIT.

I am not going to bore you with, what GIT can do and what it can't (well I am yet to find out what it can't do).  In this series of two post, I am going to share with you guys, a real life incident that changed lives of a few developers, changed their perspective about version control forever.

I was working in a techie company, in a team of around 120 smart developers (yes all of them working on one project.  And no, we were not building the next generation rocket that will travel faster than the speed of light).

The version control system we were using was Subversion (SVN).  We had an continuous integration server setup.  Which means as soon as someone checks-in any code, build is triggered and all test cases are run.  If the build breaks, people know something is wrong and someone needs to fix it.

Looks like a standard project?  Well, you will see.  Continuous integration was the most important aspect of this project, imagine 120 smart developers churning out code every minute at least for 7 hours a day (man that is a lot of code! I agree).  In this situation one has to be very careful about what goes into the central SVN repository.  This is important because, we need the build to remain green as much as possible.

To do that we had a simple rule

NEVER COMMIT ON A BROKEN BUILD, UNLESS YOUR COMMIT WILL FIX THE BUILD.  

Well it sounds fair, no commits on a broken (or red) build.  If you are fixing the build, then of course you are allowed to commit.  Else build would remain broken for eternity!

All this is nice and rosy but there is one problem.  Because the project was so big, full build took around 1 hour (sometimes even more than an hour).  Wow! one hour (What where we building? I think Linux kernel builds must be faster than this) before which someone can find out whether his/her changes are good or not.

Whats wrong with that?

Well consider this scenario.  Deep and Jamie have been working on developing the Feature 1.  When working with SVN they are always in a dilemma whether to commit or not to commit.

Why?

Because, When you check new code in, everybody else gets it.  As soon as someone commits to SVN, its made public.

Hence, they have two choices:
  • Check in half backed, buggy code and drive everyone else crazy
  • Avoid checking it in until the feature is completely developed
Deep and Jamie prefer to take the path - 2.  They decide to not commit any code till the Feature - 1 is developed and code is in a stable state.  Typically, feature development may take 2-4 days.  No commits for 2-4 days!  No version history for 2-4 days!

From the start of feature development till the end, developers cross many milestones or logical points.  At these points, code is relatively stable and does a specific task well.  All tests are passing, but the feature is not yet complete.  All those logical points are really crucial.  But with SVN these logical points are lost in time.

Since they cannot commit to SVN, till the feature is working, they have no choice but to continue coding.  After 3 days they are done with the development.  Now, they have a big change set (code changes worth 3 days) that they want to commit to SVN.  There are two possibilities now:
  • They can commit to SVN, because the build is green (not broken)
  • They cannot commit to SVN since the build is red (broken).
If they can commit at this point then, everything is well and good. 

But, if they cannot commit at this point then, again they have two choices
  • They can wait for the build to go green (and in the mean time can play some game on XBOX or have some noodles)
  • Start development of Feature - 2.  Commit changes set of Feature -1 when the build goes green.
Deep and Jamie, cannot wait for the build to go green.  Build takes an hour to go green.  They can't loose this many billable hours doing nothing.  You see, they are responsible and committed developers.  They move on and start the development of Feature - 2.

Lets say, after another hour build goes green.

Mission - commit Feature - 1 begins! 

Remember they were in a good state before an hour.  All tests were passing and Feature - 1 was complete.  But since then, they have done more coding (remember they had started development of Feature - 2).  They want to commit changes of Feature - 1 but not those of Feature - 2.

What do they do, what do they do?  Time is running out.  They must commit now.  They can't wait any longer to commit Feature - 1.

They decide to selectively check-in Feature 1 files.  They take a call, on whether to check-in a specific file or not based on their programmer instinct.  They hope and pray that, the files they have checked-in, form a logical change set for Feature - 1.  Build is triggered.  While their changes are being built.  Asif and Sandeep check-in their changes.  Remember build takes an hour, everyone is back to churning out more code.

Deep and Jamie's prayers where not answered!  The build breaks.

Build is broken!  Fun starts now!  Count the number of hours before which the build is green again. 

Since Asif and Sandeep had already checked-in (they had checked in when the build was green and it was building some changes),  one more hour is lost .  Naturally, since build was broken before their changes where integrated, this build is on a death march.  Build is in broken state since an hour now.

Other developers have already started cursing, Deep and Jamie for breaking the build.  But wait, Deep and Jamie are thinking their changes can never break the build, their code is flawless!  They had tested the Feature - 1 so many times.  Finally they realize, the files they had checked-in did not form a complete changes set required for Feature - 1 (remember they had selectively checked-in the files).  They forgot to check-in a file which was actually required.  Damn!

It has already been an hour since they last checked-in, they have done some more coding.  To fix the build they have to got back to the original check-in state, check-in the required file and hope that the build goes green.  It all looks very complex to me!

This is a risky situation, developer is not sure about which files to check-in. 

Although, they could have taken a patch of Feature - 1 before starting development for Feature - 2, revert all changes before committing, apply Feature - 1 patch and commit to SVN.  Anyone who has done this even once would agree that this never works smoothly.

The fact is Working on multiple streams of work is not natural in SVN.

After their check-in, build need to run again.  It runs for another hour after which it goes green.  Out of 8 hours of working day build was broken for 2 hours!

Extrapolate this situation with 120 developers.  As the number of developers increase the problem gets bigger and bigger.  Everyone wants to check-in what ever code they have written.  When the build goes green, may be, they are not in state to commit.  They might have crossed the logical point.  End result, no matter however strong your instinct are, you can never be sure that your check-in has all the changes you intended to commit.

Why does this happen? 

With so many developers and such long build cycles, there is a narrow window to check-in.  There where times when, we could not check-in for one week!  Its an unbelievable situation, madness actually, a weeks work that is not checked in, A weeks code without any versioning.  Since we cannot commit to SVN without making our changes public, its impossible to maintain the history of a file at every logical point.

No offline or local commits is a single most important problem with SVN. 

What are the alternatives?

Some teams started branching out from trunk.  The plan was, to do the feature development in this new SVN branch and then merge it with trunk at the end of feature development.  They solved one problem, they could commit as often as they wanted, without impacting other developers outside their team.  But this had given birth to a bigger problem.  The problem of Merge.  They still had to merge their code (lying in SVN branch) with the trunk.

Other teams were still working out of trunk.  Hence, their SVN branch is getting outdated every minute.  At the end of a week, when they try to merge the code into trunk, they experienced, what I call, hell on earth.

When we diverge two SVN branches and try to merge them, subversion tries to figure out what has changed and fails.  End result, SVN shows a lot of merge conflicts.  These are not really conflicts but places where SVN failed to figure out what was wrong.

I have seen people doing the merges for two straight days till 0200 hrs, after which they think, they are in a good state to check-in, only to realize that they have broken the build.

Well, you might find it unreal and exaggerated, but this has happened and its a sad situation to be in. 

Enough of SVN bitching,  in the next post I will show how GIT helped us to get around all these problems.
Have some Fun!