Sunday, October 7, 2012

Using NHibernate to fetch Hierarchical information with SqlServer - Part - 1

Recently in one of my project, we had a very interesting situation.  The relationship we were trying to model was pretty common and straight forward.  We wanted to map relationship between Employees and Managers.  The business rule was,

One Manager can have many subordinates (employees working under him) and one employee can have many managers  

I know! poor employee has to deal with more than one manager :)

As you might have guessed already, The mapping was pretty straight forward.  Employee table has a many-to-many relation with itself.  One manager could be a subordiante to someone else and one subordinate could be a manager for some other employees.

The domain class looked like

The fluent mapping looked like:

The many-to-many table that holds the employee-manager relationship is called employeemanager. The data in these tables looks like
Shows all the employees in the system
Many to many table which lists employees and their managers
Employees and their managers
As you can see.  The employee BatMan is right up in the ladder.  He has SpiderMan and HeMan as his subordinates.

HeMan and SpiderMan are the managers for SuperMan.  SuperMan is in-turn the manager of IronMan.

As you can see with only 5 employees we have a 3 level hierarchy build between our employees.

The query to get the direct subordinates of any given manager is simple enough and it looks like:
The output of this query would look like
Subordinates of BatMan
This query only returns me the direct subordinates of BatMan.  However, if you think about this for a moment, even SuperMan and IronMan are in someways subordinates of BatMan.  They are not the direct subordinates of BatMan but they are actually below BatMan.

The business requirement stated that, when a user logs in he should see all his subordinates - Direct or Indirect.

What kind of query would produce a result like that?

How do they do it?

What we really want is some kind of recursion so that we could go to any level to get all the subordinates of a given employee.  Different databases offer different ways in which such a query could be written. 

With SQL Server we could use a concept called Common Table Expression (CTE).  Using CTE we could build recursive queries.  The CTE query to get all subordinates of a given employee would look like

The above expression defines the CTE. I know it looks complex at the first go. But once you understand it its not that complex.

So lets break the query into parts and then decode it: What are the parts of the query:
This query looks very similar to the query to get all the direct subordinates of BatMan. Only difference is we have added a column called Level which will have value of 1 for all its rows. Notice that we are wrapping this query with the CTE called EmployeeHierarchy. After we have defined the CTE we can then make use of the CTE in subsequent query. Lets look at the next part of query
Next Part:
Now here is where the magic takes place. This query tries to get all the subordinates of managers selected from the CTE query. This query refers the EmployeeHierarchy defined in the previous query. In this query we are incrementing the level value by 1. This means this query will get me the Level 2 of subordinates of BatMan. Since the CTE is refered in its own defination we eventually get a recursive query.

If we try to run the following query:

All subordinates of BatMan are shown
As you can see we did get, SuperMan and IronMan listed as 2nd and 3rd level subordinates of BatMan!

Since there were only 5 users we could manage the data very well.

What would happen if we had some sort of cycle in our data.  For e.g. what would happen if data is setup in such a way that IronMan could also be a manager of HeMan?
If we run the CTE this time here's what you would get
Error caused due to Cyclic references in the DB
The problem is, we have a cycle in our data.  Because of this CTE is never able to end its loop.  And it throws an error after 100 recursive calls (You can change the value of number of recursion CTE should perform, by default its 100).  Increasing the number of recursion CTE should perform is not going to help us, it will still fail.

The only way to fix it is to not navigate the cyclic path over and over again.  For doing so, we introduce another field in our CTE called HierarchyPath.  The updated CTE looks like

The output of above query would look like
CTE with HierarchyPath.  Showing how we avoided cyclic recursion
The only change in this particular CTE is we have introduced a column called HierarchyPath.  This path will represent a breadcrumb of how an employee is related to a given manager.  To avoid navigating the cycle again and again, we added the where condition which checks whether the HerarchyPath build already has the subordinate listed before the given manager.

That's all for this post.  In the next post we will look at how to integrate the CTE with NHibernate.
Have some Fun!