Building distributed systems is hard. Justifying your design to your colleagues who are operating on another wavelength is harder.
A couple years ago I was contracting to a small software house. I was part of team working on a government project involving WCF, K2 Blackpearl and a winforms application. The winforms application was a sort-of thick client housing a fair amount of business logic across a number of forms. Some forms were rather large and complex, others were simple and trivial. I use the description “sort-of” as a larger portion of the business logic was contained within the web services.
The architecture took a very basic n-tier approach. Your standard application calling into a web service, which in turn passed the request down the layers to the Business/Workflow and DAL. The idea behind the application was that the user would see a list of tasks on their main form, they could open a task and it would be exclusively locked to the user. The user could then perform actions based on the current step in the workflow, and progress the task through the stages of the system. Opening a task required about 3-4 web service calls. Due to the reliance on the K2 workflow system, these calls could take up to a couple of seconds each to process! Obviously, for perceived performance they were run asynchronously where they could be.
The biggest problem with this approach was with the design of the main form. Because it displayed point-in-time data, it was always stale. A refresh would be required to update to a more recent point-in-time, but this could take up to 30 seconds at a time to complete.
This could lead to a problem. If user a on computer a opens up task 1, user b on computer b had no way of knowing that this task was locked exclusively to the user without an explicit refresh of the data in the main form. If user b tried opening this task, a web service call would be made and the task would be checked for lock, which involved touching a number of systems and going through the K2 workflow process. No big deal, they can move on to the next task. But what if it too is exclusively locked to someone else? There begins the cycle of wasted resources because of stale data. Somewhat inefficient, no?
It may work for a small amount of simultaneous users, where the system state doesn’t change very rapidly, but the requirements were for the system to handle 200 users at any one time. So what is the solution?
Well, my colleague and I, both students of Udi Dahans Advanced Distributed Systems Design Course, suggested using a publish/subscribe architecture. We proposed an event based architecture. Rather than wasting resources by querying the services based on stale data, we keep the UI as up-to-date as possible by publishing changes to each client (subscriber). This means, when user a on computer a opens up task 1, user b on computer b is notified, and the corresponding task is removed from user b’s work list. This way, the tasks user b sees in his/her list are tasks that aren’t exclusively checked out to another user. It was a perfect scenario for the pattern.
Unfortunately, it took all of 5 minutes for our suggestion to be shot down. Why? Because it wasn’t the “Microsoft way”. An excuse we hear all too often.