The issue was *not* reproducible on any other developer's machine. This got me thinking, there must be some issue with the mappings which is causing NHibernate to throw the exception. The developer must have made some local changes to the mapping files, that is causing the exception.
Investigation:
I tried various things to figure out what was causing the problem. Here is the list of things that I tried.
Check mapping files for modifications:
I checked and rechecked the mapping files for any local modifications, but, No, there was no local modification. I even checked-out all the mapping files just to be sure that there was no modification. But still no luck, the issue was reproducible on her machine and not on any other machine.
Tried to reproduce the issue on my machine by connecting to other developers Database:
After the first failed attempt, next thought that crossed my mind was, may be, there was something wrong with entity data which is causing the exception. I decided to run the same test on my machine by connecting to other developers database.
Surprisingly, I was still not able to reproduce the issue. This was totally weird. I was 99% positive that, issue would get reproduced once I connect to the order developers database.
I was running out of options. How do proceed further? What should be the next step?
Stick to basics - Debug with Elimination:
I was back to the drawing board. Decided to use Elimination to narrow down the issue. The question that was bothering me was, Whats so special about the entity we are trying to load?
The entity we were trying to load was called EmpDept, it represents the Employee Department information (actual names of the entities have been changed). So what was so special about this table?
This table was a legacy table, it did not have a primitive primary key, it had a composite key made of UserID and DeptID.
So what? A lot of legacy databases have composite keys, NHibernate works well when it comes to mapping the composite keys, in fact, NHibernate has no problems loading the same entity on my machine!
I decided to, removed all other mapped properties and just kept the mappings for the composite key and try to fetch the entity. This time, it did fetch the entity without any problems on the other developers machine!
We were starting to get somewhere now!
I started adding back the mappings to find out which one was causing the issue. Finally, we did find the minimal possible mapping that would reproduce the issue.
The Cause:
We deduced that, the issue occurs when the following conditions are satisfied
- Entity is mapped with composite primary key
- Entity has a one-to-many relationship with other entity
- The one-to-many relationship is not based on the composite primary key, instead its with property-ref
- And the data type of property-ref is different from the id property
Let's look at the minimal mapping file that reproduced the issue
What do we have here? We have a very simple mapping between two entities.
- Entity EmpDept is mapped to a table called EmpDepts
- EmpDept uses a composite primary key to class EmpDeptIdentifier
- The composite primary key consists of UserID and DeptID
- EmpDept entity as another property called Email
- There is a one-to-many relationship between EmpDept and Task entity i.e. One EmpDept can have Many Tasks
- Note that the relationship is not based on the composite primary key
- The relationship is based on a non identifier property called Email
- The data type of the Email property is string while the data type of the composite primary key is EmpDeptIdentifier
So far we had figured out, what conditions were required for NHibernate to throw the exception. However the real mystery was, Why was the exception occurring only on one developers machine? Why not on other developers machines?
The Enlightenment:
Our curiosity levels were at all time high! We decided to debug the NHibernate code to figure out why was exact same code working perfectly fine on one machine and failing on other.
After some intense NHibernate code debugging we found the NHibernate code that was throwing the exception.
Notice the line that throws the InvalidCastException in the above code? This is where the exception occurs.
The above method is called to build a string that will be used for logging. We traced back the call to this method, it was getting called from the MessageHelper.InfoString method, which in-turn was getting called from the LoadContexts.LocateLoadingCollection method.
We now knew the place from where the exception was thrown, but we still didn't know why it was thrown only on one machine.
Looking through the NHibernate code we realized that, the method MessageHelper.InfoString is only called if NHibernate is running with DEBUG log level.
That ladies and gentlemen, was the "Aha Moment!" for us. This was the missing piece of the puzzle. The developer on whose machine the test was failing was running NHibernate at log level DEBUG, while I was running the test at log level INFO. Because of this the method MessageHelper.InfoString was never invoked on my machine and thus no exception!
We Googled a little and found that there is an official bug reported for this scenario.
Eventually it turned out that, what we initially though as a big issue was really a non issue, since NHibernate will not be configured in DEBUG mode 99% of the time (because it logs hell lot of information). The issue we were facing is very unlikely in real systems. Although it was not such a big issue after all, this experience taught us quite a few things about NHibernate!