Tuesday, December 28, 2010

The thrill of being a developer - Fixing the problem of weird URL's

There are days when as a developer, I feel bored, bored to death, angry, happy, sad and many other emotions.  But there are a few days in life of a developer, when you feel truly thrilled!

Recently, one such thrilling day occurred in my life.

In one of the project, we were facing a weird and interesting issue in production.  The application server would get weird requests from the browser.  The weird thing about these request was that, the parts of the requested URL would be valid but, the end part would contain some weird html tags!  Since the resource would not be found on the server, app server would log the request and return a 404 error page.

For e.g. one such log message was:

The request that browser sent was:

The above URL is to fetch jquery.maskedinput.min.js file.  

Generally when an issue is reported by the QA/Client, I already have some idea as to what could have caused the issue.  But, this issue was special.  When I first saw these error, I was completely blank.  I had no idea what so ever why would something like this every happen!

We were using a pretty common Java stack.  Spring MVC and Hibernate, app was deployed on Weblogic server.  I had used these frameworks in so many other projects, but never ever faced an issue like this.

The most frighting part was, I had no idea what was the end user impact.  Was the user logged off?  Was he shown any error?  Did he see a garbled page?  No idea at all!

Coming back to the issue.

Yes believe it or not, the above request is to fetch the jquery masked input javascript! Notice that the beginning part of the URL is valid, but what gets added at the end is really not expected.

From where did

text get appended to the URL?

Digging a little deeper, I found something totally bizarre.  Turns out that the text

is present in the same page after some 4096 bytes!

How is it possible that something that comes after a few thousand bytes gets appended to the request URL of jquery.maskedinputments.min.js?

After two and half days of hard core googling, trying hundreds of different things with the script tag, the textarea tag, using different permutation combinations, and countless hours of analyzing the server logs over and over again, finally I have found a credible explanation to weird URL’s issue we were facing.

It happens because of a bug in IE8’s Lookahead Downloader.  The problem has nothing to do with the application!

What are you talking about?  Explain me in detail:

Following is an extract from this URL: http://blogs.msdn.com/b/ieinternals/archive/2009/07/27/bugs-in-the-ie8-lookahead-downloader.aspx which describes the bug in full detail.

Lookahead Downloader is used to quickly scan the page as it comes in, looking for the URLs of resources which will be needed later in the rendering of the page (specifically, JavaScript files). The lookahead downloader runs ahead of the main parser and is much simpler-- its sole job is to hunt for those resource urls and get requests into the network request queue as quickly as possible.

The problem here is that there are a number of tags which will cause the parser and lookahead downloader to restart scanning of the page from the beginning. One such tag is the META HTTP-EQUIV Content-Type tag which contains a CHARSET directive. Since the CHARSET specified in this tag defines what encoding is used for the page, the parser must restart to ensure that is parsing the bytes of the page in the encoding intended by the author. Unfortunately, IE8 has a bug where the restart of the parser may cause incorrect behaviour in the Lookahead downloader, depending on certain timing and network conditions.

The incorrect behavior occurs if your page contains a JavaScript URL which spans exactly the 4096th byte of the HTTP response. If such a URL is present, under certain timing conditions the lookahead downloader will attempt to download a malformed URL consisting of the part of the URL preceding the 4096th byte combined with whatever text follows the 8192nd byte, up to the next quotation mark.  Web developers encountering this problem will find that their logs contain requests for bogus URLs with long strings of URLEncoded HTML at the end.

Impact on the end user:

Generally this has no direct impact on the visitor's experience, because when the parser actually reaches a tag that requires a sub-download, if the speculative downloader has not already requested the proper resource, the main parser will at that time request download of the proper resource.  Hence,
  • The visitor will not notice any problems like script errors, etc
  • The visitor will have a slightly slower experience when rendering the page because the speculative requests all "miss"
  • IIS/Apache logs will note requests for non-existent or incorrect resources

The fix:

The fix is simple.  We needed to apply a IE8 Cumulative Update (KB980182) patch on the client machine’s which have this problem.

When I read this post, I was just stunned!  Problems like these truly blow your mind away!

Today, when I think about the problem, I realize that, its because of problems like these, I love being a Developer!

True developers, solve Real problems!
Have some Fun!