Deep Shah's Blog: January 2011

Friday, January 28, 2011

How to rewrite history with GIT - Part - 2

This is the second post in the series of two posts to show how easily we can re-write history using GIT. In the previous post we saw

How to change the message of one of the previous commits
Removing a commit
Merging or squashing two or more commits.

In this post we are going to look at

How to re-order the commits
Editing the previous commit to include/remove files.

Lets not waste any more time. Lets look at how is it done!

How do they do it?

As in the previous post, lets start by doing some mistakes, by making a few erroneous local commits.

The current situation is that I have synced up with my remote or the public repository and there are no local commits.

git log shows that, I have pushed all my commits to SVN. Although in my example, I am using GIT over SVN. But the rewriting history feature can be used with pure GIT setup as well.

Lets say I had to write a script to insert dummy records in the user table

Committing the new file

After committing this file, I remembered that, the script to create the USER table itself is not yet checked in. Lets create the create user table script and check it in.

Committing the create user script.

Heres a problem, we have the insert user script committed before the create user table script.

The table should exist before we can insert data into it right! We will have to reorder the two commits. But before we fix this issue, lets make some more mistakes.

Lets say, we want to create the role table.

Committing the file.

Note that I have not assigned any primary key to the role table. This again is a problem. We should have a primary key associated any table. Lets do one final commit and insert a ROLE_ADMIN in the role table.

Committing the file.

I realize that we have made enough mistakes, its time to correct them.

Looking at what we have so far.

We have made 4 local commits which have not been pushed to the remote repository. Before we push them to the remote repository we would like to do the following changes

The insert user script commit (fded92599c04f70c17a40695d1dfd2c54fa5efe2) should come after the create user table script commit (7f4288ea82a69001f3872630ab05672a54730558).
There is no primary key assigned to the role table in the create role table commit (d1f9d2df72ff3470d971a6d7f00b9d35caf59ee6).

To fix the second point, we could make another commit to alter the role table and add a primary key. But lets edit the commit and make the "id" column as the primary key in the create table statement itself. This way the create role script will be complete and bug free.

Lets use the command, git rebase -i remotes/trunk (we learned in the previous post) to fix our mistakes.

Please note that we have not yet pushed those commits to the remote repository. Rewrite history should be used only in this case. If we have already pushed our changes to the remote repository this feature should be avoided!

The fixup:

This command tells git to do an interactive rebase. The remotes/trunk tells git that it has to do the rebase on the commit that is currently the HEAD on the remote repository. Basically its the commit that was last pushed to the remote repository. In this example its the commit 88e33ddc9c7b139bb89cbd206e838fe453978e2f - SOMEPROJECT-1| User is able to register on the site - Deep. With this explanation its obvious that we could also re-write the above command as

This command will open up your favorite text editor. The data in the text edit should look something like this

We have seen this message in the previous post. Short recap of the options we have:

p, pick -> To pick the commit or use the commit as is. We used this in the previous post.
r, reword -> To reword the commit message. We used this in the previous post.
e, edit -> To edit the commit. We want to edit the create role table script to add the primary key. Hence we will edit the commit SOMEPROJECT-2| The create role table script to create the role table - Deep (d1f9d2d)
s, squash --> It practically means to squash! It merges the commit with the previous commit. Saw that in the previous post
f, fixup --> This one is just like squash but discards the current commits log message.
It also tell us that if we remove any commit line, that commit will be removed.

To re-order the commits we simply need to change the order of the lines in this text message.

Summarizing what we want to do:

Edit the commit d1f9d2d SOMEPROJECT-2| The create role table script to create the role table - Deep
Reorder the commits fded925 SOMEPROJECT-1| The create user script to create the batman user - Deep and 7f4288e SOMEPROJECT-1| The create user table script to create the user table - Deep
Pick the commits 8745c7b SOMEPROJECT-2| The insert roles script to insert the role values - Deep

Lets edit the commit lines and now they should look like this

Note that we have changed the order of create user table and insert user script commits and we are editing the create role script commit.

Save and quit from the text editor. After this, GIT will stop at the create role table commit and show some output like this

It clearly tells us that we can now amend the commits using the git commit --ament command and when we are satisfied we can do a git rebase --continue. Lets see what git log shows us

It shows that

Currently the top most commit i.e. the HEAD commit is SOMEPROJECT-2| The create role table script to create the role table - Deep.
The commit SOMEPROJECT-2| The insert roles script to insert the role values - Deep is missing

Where did the SOMEPROJECT-2| The insert roles script to insert the role values - Deep commit go?

Relax! do not worry. We are in the middle of a rebase and we had asked GIT that we want to edit the SOMEPROJECT-2| The create role table script to create the role table - Deep commit. We wanted to add the primary key to the role table. Hence, while doing the interactive rebase, git has stopped at the desired commit and given us a chance to edit/amend it.

We can do the required changes for adding the primary key to the role table and then amend the commit. After this if we continue our rebase we will get back the SOMEPROJECT-2| The insert roles script to insert the role values - Deep commit! How awesome is that!

Lets edit createRoleTable.sql file

Lets amend the commit.

While amending the commit git will open up your favorite text editor and lets you edit the commit message. We do not want to edit any message, lets just save and quit the text editor.

If you do a git show now you will see that the createRoleTable.sql has been successfully updated in the commit SOMEPROJECT-2| The create role table script to create the role table - Deep

Lets continue to rebase and get back the last commit.

This command finishes the interactive rebase and we have achieved the desired result. Doing a git log now, will show that we have successfully re-ordered the commit as well.

Rewriting history is one of the best features offered by GIT! The elegance with which git achieves this is amazing!

Still not using GIT? Do no waste any time. Go GIT it!

Wednesday, January 19, 2011

How to rewrite history with GIT - Part - 1

One of the truly awesome feature of GIT is rewriting history. This is one of the best and most useful feature that GIT has to offer. Other distributed version control systems (read Mercurial(HG)) do not offer this feature out of the box (additional extensions are needed to achieve this in Mercurial). Over and above that, with Mercurial the process of rewriting history is pretty painful. But when it comes to GIT the support to rewrite history is build right in the core. The elegance with which GIT performs this task is amazing!

What does rewriting history mean?

Rewriting history could mean many things.

Changing the message on one of the previous commits
Removing the commit completely
Merging two or more commits in one commit.
Re-ordering the commits
Editing a previous commit to include/remove files

All of these are extremely useful features. When used correctly could save you from a lot of trouble!

Why should I use it?

For people who are used to central version control systems (like SVN, CVS etc), rewriting history feature is unheard of. Initially it might feel that you do not need this feature. What is the user of it? Take my word on this, its really really helpful feature. It gives you the power to correct your mistakes even after you have committed! Isn't that awesome!

Like it already?

Show me how to do it. How do they do it?

In this post we will look at how to do the first three points. In the next post I will cover remaining two points.

Lets start, the current situation of my GIT repository is that, I have synced up with my remote or the public repository and there are no local commits.

git log shows that I have pushed all my commits to SVN. Although in my example, I am using GIT over SVN. But the rewriting history feature can be used with pure GIT setup as well.

OK, lets start by making some local commits.

I hate to remember those ugly admin passwords. Lets add a password.properties file that holds the administrator password.

Lets commit this file

Never do this! Never every commit administrator password to your repository. This is a mistake (Yes, I realize that!), we will correct this later.

Let go on for now, lets make more local commits. Lets say there is a long pending bug we wanted to fix. Lets fix it!

Lets commit our bug fix.

Again note that, commit messages like "Fixing bug" are hardly helpful. Please please please, always put informative commit message. This not only helps others but it will help you as well. Providing informative commits message helps get an idea of why the commit was made and what to expect in it. Its just common sense!

We have made another mistake (by not providing informative commit message), but will fix it later.

Moving on, making more changes.

Committing the refactored file

Writing a test case to prove that the performance of the app has actually improved.

Committing

Lets stop at this point and have a look at what we have done so far.

Basically we have made 4 local commits which have not been pushed to the remote repository. We suddenly realize that we have done a few blunders!

Committing the administrator password to the repository is definitely not a good idea --> We need to get rid of this commit
Fixing Bug --> What is this commit for. We need to reword the commit message
Refactoring --> The refactored file and its test case should be one single commit. We should not have separate commits for them. We need to merge these two commits.

GIT gives us a second chance! Its not too late yet. We can fix our mistake.

Please note that we have not yet pushed those commits to the remote repository. Rewrite history should be used only in this case. If we have already pushed our changes to the remote repository this feature should be avoided!

The fixup:

There are multiple ways of doing this but I will show the easiest. To rewrite history use the following command

This command tells git to do an interactive rebase. The "remotes/trunk" tells git that it has to do the rebase on the commit that is currently the HEAD on the remote repository. Basically its the commit that was last pushed to the remote repository. In this example its the commit "88e33ddc9c7b139bb89cbd206e838fe453978e2f" (SOMEPROJECT-1| User is able to register on the site - Deep). With this explanation its obvious that we could also rewrite the above command as

This command will open up your favorite text editor. The data in the text editor should look something like this.

It shows the list of commits that will be rebased. It sows some pretty informative comment below that. It tells us to use:

p, pick --> To pick the commit or use the commit as is.
r, reword --> To reword the commit message
e, edit --> To edit the commit. We will look at this in the next post
s, squash --> It practically means to squash! It merges the commit with the previous commit
f, fixup --> This one is just like squash but discards the current commits log message.
It also tell us that if we remove any commit line, that commit will be removed.

Pretty neat! We want to do the following

Remove the commit be6628d SOMEPROJECT-2| Adding the administrator password to the repository - Deep
Reword the commit b63e0c3 Fixing bug
Merge the commits 62c2f81 SOMEPROJECT-3| Refactoring the file to improve performance - Deep and 95a929a SOMEPROJECT-3| Test case to prove that performance has actually improved - Deep

Lets edit the commit lines and now they should look like this

Note that we have removed the commit line for "be6628d SOMEPROJECT-2| Adding the administrator password to the repository - Deep". We have informed GIT that we want to reword a commit and squash two commits.

Save and quit from the text editor

After that, GIT will open up your favorite text editor. This time should have information like this

Lets edit the commit message like this

Save and quit the editor. GIT will open up the text editor again. This time it will look like this

Look at the beautiful message GIT has given us. It tells us, this is a combination of 2 commits and give us the two messages. If we want we can edit the messages. But for the sake of this example lets keep the messages as is. Save and quite the editor. It prints some messages like this

Now, lets take a look at how our local commits look like after the rebase.

From the log its evident that we have successfully

Removed the commit be6628d SOMEPROJECT-2| Adding the administrator password to the repository - Deep
Reword the commit b63e0c3 Fixing bug to SOMEPROJECT-2| Fixing the programing error because of which a deadlocak situation could arrise - Deep
Merge the commits 62c2f81 SOMEPROJECT-3| Refactoring the file to improve performance - Deep and 95a929a SOMEPROJECT-3| Test case to prove that performance has actually improved - Deep. The new merged commit id is 555cb8f

Its needless to say that GIT has an awesome support to rewrite history!

Still not using GIT? Do no waste any time. Go GIT it!

Monday, January 10, 2011

The Ivory Tower Architect

In one of my recent project, client was afraid of doing some architectural changes. The changes proposed were necessary and were for the betterment of the application. We were proposing the use of some of the industry standard libraries, frameworks and tools. Use of these new frameworks would make the code more maintainable.

In spite of the benefits of the proposed architectural changes, client was not really keen on doing those changes.

Why?

This question kept bothering me for sometime. Digging a little deeper into the past, found that client had a bad experience with one of the architects. They had hired an architect, worked with him for a year, but nothing fruitful came out of this engagement. Sounds familiar?

Time and again I have faced this sort of a situation. One of my previous post was on similar lines. Be a Developer not an architect!

Googling a little I found a commonly used term for such architects. Its called Ivory Tower Architect

What are the symptoms of Ivory Tower Architects:

Notice the usage of the word "symptoms".

According to wikitonary The meaning of "symptom" is

"Anything that indicates, or is characteristic of, the presence of something else, especially of something undesirable."

Based on the above definition, you must have understood that Ivory Tower Architect is almost like a decease. If it gets hold of someone, it does not leave them very easily.

Back to the original question. The symptoms of Ivory Tower Architects:

Loves attending meetings all the time. He thinks, their job is to attend meetings. If there is nothing to discuss, He will invent new reasons to organize a meeting. Usually these meetings are very long, in fact He believes in "longer the better"
Loves talking down to the developers. He thinks developers are some inferior species.
Loves to show off his mastery of software design. When I say, "the mastery of software design", it means mastery in drawing pretty pictures. Whether the design actually works in practice, is not relevant to him (Usually, the design won't work)
Loves to churn out documents. This person is like the printing press, continuously creates/updates documents. He thinks, his productivity is measured by number of mail, xls and doc files created/updated in a day. Not to mentioned the number of Minutes of Meetings he (or someone on his behalf) publishes every day.
Forces the development team to implement the architecture he has designed
Pretends that he is listening to your suggestions, but actually he is thinking about ways to reject your suggestions. No matter how relevant your suggestion is, he will never accept it. To do so undermines his very existence
He will never ever work directly on the code himself. He thinks that coding should only be done by mortal developers!
Does not believe in the KISS (Keep It Simple, Stupid!) principle.

If you know someone who has these symptoms, please do not forget to send them a get well soon card (this is the best you can do)! In case you can gather enough courage, tell them to stop goofing around and start doing real work!

Monday, January 3, 2011

How to find the real culprit using GIT bisect

Recently, I had a situation in one of my project, a certain feature which was working sometime back, no longer worked on the HEAD revision. I was pretty sure that it was working in the past, at a certain point in time.

Now, I tried for an hour to find what did go wrong because of which its no longer working. But my efforts were in vein.

What I knew for sure was that, at revision HEAD - 20 the feature was working for sure. So many files must have changed between revision HEAD - 20 and HEAD. How to find the real culprit is the question?

To find where was the real problem, or when was the problem introduced the first time, I decided to use a cool feature of GIT called bisect.

What does bisect do? Well its bisects! Thats what it does and you know what, it does this pretty well!

Bisect, uses a form of binary search to find the real culprit, i.e. the first commit from which a given feature was not working. Essentially this is the commit from which the bug was introduced!

Bisect works on a simple principle, all you have to do is

Identify the good commit (i.e. the commit at which the feature was working)
Identify the bad commit (i.e. the commit at which the feature was not working)
Keep bisecting them till you find the real culprit.

For e.g. lets say that on revision HEAD the feature is not working and on revision HEAD - 20 the feature is working. That means the culprit lies between these two revisions.

In this example good commit is revision HEAD - 20, and bad commit is revision HEAD. We start of by bisecting these two commits, GIT checkouts revision HEAD - 10, we quickly test whether the feature is working in this commit or not. If the feature is working in this commit, the culprit lies between HEAD - 10 and HEAD revisions. If the feature is not working in this revision as well the culprit lies between HEAD - 20 and HEAD - 10 revision.

With this new information, our good and bad commits have changed to either revisions HEAD - 10 and HEAD or HEAD - 20 and HEAD - 10 depending on where the culprit lies. We now start bisecting the new [good, bad] commit set.

This process continues till we get one commit, i.e. the real culprit!

Does it sounds like a lot of work? With GIT you do not need to worry, relax and take deep breath! GIT makes it really easy! Lets look how.

How do they do it:

Lets say that the revision HEAD or the BAD revision is 78158f935cc5aa949a3e60068297befe43b26354 and the revision at which the feature was working for sure or the good revision is 8446e9349a8217c7428b6386fa78b1f88565684d. There are 10 other commits in between these commits.

Lets start bisecting the commits to find the real culprit. This is done using the following command.

Now GIT knows that you want to start bisecting the commits to find the guilty commit.

Next, you need to tell GIT about GOOD and the BAD commits. This is done using the following command

This command tell GIT the commit id of the GOOD commit. Likewise GIT needs to know the commit id of the BAD commit

At this point GIT knows the good and the bad commits and it shows output like

Its basically informing us that it has bisected 4 revisions and you are mearly 2 steps away from finding the real culprit!

GIT has checked out the commit 762eec7656a4bb742cb889b686645897bdd984af, which is roughly somewhere in the middle of good and bad commits.

To visualize what it means, type the following command

This command shows you what has exactly happened. It shows you the GOOD and the BAD commits marked in gray color. It also shows you which commit has been checked out, with a tiny yello circle.

At this point you can quickly run your test to find out whether the checked out version is good or bad.

If the feature is working at this commit then, its a good commit
If the feature is working at this commit then, its a bad commit.

Let say in this case, the feature is not working in this commit. Hence its the bad commit.

We need to inform GIT that this commit is a bad commit. This effectively means that the culprit commit lies between GOOD i.e. 8446e9349a8217c7428b6386fa78b1f88565684d and BAD i.e. 762eec7656a4bb742cb889b686645897bdd984af commits. We have eliminated the first 5 commits in 1 iteration.

To inform GIT that this is a bad commit run the following command

Running visualize, will clearly show you that the commits which contain the guilty commit have narrowed down to just 5 commits

Again, repeating the same process of finding whether the feature is working or not in this particular commit. For this case, lets say that the feature is working i.e. this is a GOOD commit. Lets inform GIT that this is a GOOD commit.

Lets read the output a little carefully this time!

It says that there are 0 revisions left to test after this one. What? What does that mean. Lets visualize the current situation.

Hmm, now did you figure out what does the message mean?

Well, we are at an interesting stage now, GIT knows that which is the GOOD commit and which is the BAD commit, but the important thing to notice here is, there is just one commit between the GOOD and the BAD commits. This means that, as soon as we inform GIT, whether the Feature is working in this commit or not, it point us to the guilty commit. Thats why GIT tells us "0 revision left to test after this..."

Lets say in this case the currently checked out commit is the GOOD commit, lets inform this to GIT and see what happens.

Volla! we have it! GIT shows us the "first bad commit" or the culprit commit, since when the feature has been broken.

To view the content of the commit, we can use

I wasted one full hour trying to figure out what has gone wrong, why the hell the feature was not working! And it took GIT, merely 3 may be 5 minutes to pin point the guilty commit!

Imagine a much bigger GOOD to BAD commit range. Lets say there are 100 or more commits between the GOOD and BAD commits, GIT can save you some serious time!

Please note that at this point if you run

it will show you that you are on a temporary branch. To go back to the branch where you started from and to inform GIT that you have finished bisecting, use the following command

Thats all folks! Take my advice, Go GIT it!