« Blog spam reduction"Started and then stopped" Service Error »

Microsoft Totally One-Ups Me After Eight Years

12/03/10

  02:34:19 pm, by Nimble   , 1047 words  
Categories: Announcements [A], Programming

Microsoft Totally One-Ups Me After Eight Years

Eight or nine years ago, I embarked on a small project to make non-blocking concurrent programming easier.

There was a fair bit of code out there on semaphores and mutexes and getting threads to just wait, also known as blocking. The threads in this state do not take up CPU, but they do take up resources. Thread stacks can get to be significant.

Many multi-threading examples at the time had you extend a thread class with the information you wanted.

I wanted to be able to give myself a limited thread pool, for one. Closely related to this, I wanted the contents to be divorced from any particular thread.

I started with a 'BasicTask' nine Februaries ago - very similar to the Task in Microsoft's System.Threading.Tasks - and had it launch very small 'TaskStage' derivatives.

This worked well for simple processes, but it quickly ran into trouble when we started to get into loops, branches and error handling. Since every task stage was its own class, all it could do was suggest to the task where to go next, and when exception handling is added to the mix, it quickly turns to a mess of angel-hair pasta.

From this initial frustration was born the 'FlowedTask' which I have been using in the interim eight years. The concept is fairly simple: all those little task stages are now methods, and all the logic flow between the stages are now instruction objects. The instructions handle the "spaghetti factor" by being able to access matching instruction numbers, such as where the corresponding 'try' and 'finally' are, or where 'until' should go when its conditions succees or fail.

The code was originally put together for Delphi, but worked just fine in C#. You inherit your task from (T)FlowedTask and overrode InitializeFlow, like this:

protected override void InitializeFlow()
{
    DoWhile(ContinueRunning);
        DoStageOrDirectMethod(AwaitItemAvailable, "Wait for item availability");
        DoStageOrDirectMethod(PreProcessBatch, "Pre-process batch");
        DoTry();
            DoWhile(ItemAvailable);
                DoTry();
                    DoDirectMethod(DequeueItem, "Dequeue item");
                    DoStageOrDirectMethod(PreProcessItem, "Pre-process item");
                    DoTry();
                        DoStage(ProcessItem, "Process item");
                    DoFinally();
                        DoStageOrDirectMethod(PostProcessItem, "Post-process item");
                    DoEndFinally();
                DoExcept();
                    DoExceptionStage(ProcessException, "Process exception");
                DoEndExcept();
            DoEndWhile();
        DoFinally();
            DoTry();
                DoStageOrDirectMethod(PostProcessBatch, "Post-process batch");
            DoExcept();
                DoExceptionStage(PostProcessException, "Post-process exception");
            DoEndExcept();
        DoEndFinally();
    DoEndWhile();
}

DoDirectMethod would call that particular method and continue on in the same thread. DoStage, on the other hand, would let you set up your own conditions to continue.

For example, if you were sharing the printer between multiple tasks, you would add a channel thusly: Latch.AddChannel(new ListenerChannel(sender)).

When the task reached that instruction, it would drop the thread back to the thread pool. The 'latch' would wait to be poked by the stage-ended signal (you don't want your next method running if the printer is all of a sudden available before you finish the current method) and the printer-available signal, and launch the task on a thread pool thread.

I put DoStageOrDirectMethod together as an acknowledgement that it is more efficient to continue on in the same thread if the response to your asynchronous requests is immediate. So if the printer is completely unoccupied, in this case, or a task's to-do queue is still full, then we just continue on to the next instruction in the same thread.

(Remember that trick when I start discussing Microsoft's own excellent implementation.)

This technique has served me well for all those eight years in anything multitasking-related. I have used it to good effect for background data fetching, calculations, logging, queries and even added parallel enhancements for communications (e.g. you could make many requests at a time and the next stage would launch on any reply)

I have been totally one-upped, though, by the very cool code-rewriting async / await system Microsoft has put together for C# 5.0.

I was surprised indeed to find out that the grand "theme" of C# 5.0 was actually concurrency.

Now why is what Microsoft done particularly cool?

For one, instead of rewriting your code into tiny chunks of methods and extracting the logic into separate instructions as you have to do when using my flowed tasks, you can get the same sort of concurrency with very small additions to regular-looking code.

From Anders Hejlsberg's presentation:

async Task<XElement> GetRssAsync(string url) {
    var client = new WebClient();
    var task = client.DownloadStringTaskAsync(url);
    var text = await task;
    var xml = XElement.Parse(text);
    return xml;
}

That "await task" looks like it should block, but what it does is "drops out", just like my old stage code. Interestingly, Anders said in his presentation that this need not even happen on a separate thread; it can be alternately processed on the same thread.

This code gets rewritten behind the scenes to this:

Task<XElement> GetRssAsync(string url) {
    var $builder = AsyncMethodBuilder<XElement>.Create();
    var $state = 0;
    TaskAwaiter<string> $a1;
    Action $resume = delegate {
        try {
            if ($state == 1) goto L1;
            var client = new WebClient();
            var task = client.DownloadStringTaskAsync(url);
            $state = 1;
            $a1 = task.GetAwaiter();
            if ($a1.BeginAwait($resume)) return;
        L1: var text = $a1.EndAwait();
            var xml = XElement.Parse(text);
            $builder.SetResult(xml);
        }
        catch (Exception $ex) { $builder.SetException($ex); }
    };
    $resume();
    return $builder.Task;
}

That $action result = delegate { ... } works by the code being turned into a mini state machine (so it does not need a separate "instruction runner" like mine does), and the method gets rewritten automatically.

The Task<X> is a promise of results to come. The BeginAwait comes back to the $resume delegate when done, much the same way as I would relaunch my flowed task with the instruction state remembering where you were last. The EndAwait gives you back your result.

I believe the $a1.BeginAwait($resume) can return false if the awaiter is already done. This gives it a DoStageOrDirectMethod behaviour where it does not drop out if the result is already back.

This is nice stuff. You do not have to deal with IAsyncResult. You do not have to deal with the awaiters directly. You do not have to make a class to hold your results. You do not have to make a special listener to find out when another task is finished.

Now there may be some fun issues when it comes to debugging - I do not know; I will probably have to wait until C# 5.0 is actually shipping before I put it to work - but really, this gives you everything you want.

So colour me impressed!

Now I will wistfully wish I had somehow gotten in on the ground floor with this, having done my own version for eight years!

All good ideas are destined for inevitability, though, it seems, in particular with the folks on the language and framework development teams at Microsoft these days :)

No feedback yet