puffpio [dot] com

puffpio [dot] com

David Pio  //  software developer, technology geek, gadget whore, motorsports junkie, video game nerd, partner engineer at Facebook

Nov 10 / 12:29am

What I want for asynchronous WCF clients and C#5 async/await

I have a problem with Adding a Service Reference (and thus using svcutil). It generates proxy interfaces and proxy datatypes. If my WCF service used custom datatypes, the service reference would create a proxy datatype for it. This gives me headaches because I think I have a type Foo when in fact I have a proxied version. Now I have to cast between them. Also, if I have added two service references and they used the same datatype, guess what? It will generate two different proxied datatypes. And if I change anything about the service or the datatypes, I need to regenerate the service reference. What a pain.

Instead, I do something like this:

 [ServiceContract]
 public interface IContract
 {
   [OperationContract]
   int Add(int a, int b);
 }
 
 public class MyClient : ClientBase<IContract>, IContract
 {
   public int Add(int a, int b) { return base.Channel.Add(a, b); }
 }
 

Nice and clean huh? The WCF client uses the same interface and same datatypes as the service so no need to regenerate service references. It's tightly coupled and I also get compile time errors if I change the service interface. The only problem is that doing it this way does not generate asynchronous client methods.

The Visual Studio Async CTP gives a sneak preview into the future of the language. It adds two new keywords async and await which make it dead simple to convert regular code into asychronous code. I won't go into detail on how it works, but in addition to the language/compiler changes it also includes a bunch of extension methods that add this new style of asynchronous access to a bunch of preexisting classes that have asynchronous methods. For example, TextWriter contains a synchronous Write() as well as the asynchronous BeginWrite() and EndWrite(). This CTP adds WriteAsync() which wraps the BeginWrite() and EndWrite() into this new async/await syntax. What I envision, and what I hope happens is that they also do some work on WCF client generation (hopefully not using svcutil, but extending ClientBase). I want there to be something like this:

 [ServiceContract]
public interface IContract
{
   [OperationContract]
   int Add(int a, int b);
}

public class MyAsyncClient : AsyncClientBase<IContract>, IAsyncContract<IContract>
{
   public async Task<int> AddAsync(int a, int b) { return await base.AsyncChannel.AddAsync(a, b); }
}

That's my pipe dream. I want to not have to generate proxy interfaces and datatypes, retain compile time errors when the interface changes, and not use svcutil. The IAsyncContract<T> would be enforced to take an interface that is marked as a ServiceContract. It would then take all the methods in the interface marked as an OperationContract and create a new interface with them, but changing their return types from what they were originally (e.g. int, string, List<double>, etc) to async Tasks (e.g. Task<int>, Task<string>, Task<List<double>>).

Right now I think it's impossible for an interface like IAsyncContract to exist and do what I would like, but maybe some kind of autogenerated code could take place on compile could net similar results. Thoughts?

Filed under  //  .net 4   async   asynchronous programming   await   c#   c#5   wcf  
Sep 17 / 3:23pm

.Net 4 Task Parallel Library and Asynchronous Amazon SimpleDB Access

At work we use Amazon SimpleDB for our distributed, redundant database; and Amazon supplies an SDK to use in which to access the service. Unfortunately for some, this SDK exposes all of the calls synchronously, with no asynchronous versions of any web service calls. As such, my initial implementation of data access was something like:

 IEnumerable<MyData> data1;
 IEnumerable<MyData> data2;
 IEnumerable<MyData> data3;
 IEnumerable<MyData> data4;
 
 data1 = GetData1FromSDB();
 data2 = GetData2FromSDB();
 data3 = GetData3FromSDB();
 data4 = GetData4FromSDB();
 
 return DoSomethingCool(data1, data2, data3, data4); 

Now that looks readable and straightforward, but it does not scale. Eventually we got to the point where each of the calls to SDB were retrieving significant amounts of data. Enough to cause site slowness when looking at large datasets. Reading about the handy dandy Task Parallel Library, I modified the implementation to this:

 Parallel.Invoke(
    () => { data1 = GetData1FromSDB(); },
    () => { data2 = GetData2FromSDB(); },
    () => { data3 = GetData3FromSDB(); },
    () => { data4 = GetData4FromSDB(); }
 );
 

Great! Immediate speedup and all was good..for the moment. The data access was parallelized, but the interesting thing about Parallel.Invoke is that it limits itself to the number of processor cores your system contains. It was designed for compute bound tasks, not IO bound ones. Well great, I needed a way to go beyond the number of processor cores in the system, and I played with passing in some ParallelOptions with different parameters to try and mix it up, but settled on this:

 var tasks = new Task[] {
    Task.Factory.StartNew(() => { data1 = GetData1FromSDB(); }),
    Task.Factory.StartNew(() => { data2 = GetData2FromSDB(); }),
    Task.Factory.StartNew(() => { data3 = GetData3FromSDB(); }),
    Task.Factory.StartNew(() => { data4 = GetData4FromSDB(); })
 };
 Task.WaitAll(tasks);
 

This implementation queues up all the tasks in the ThreadPool and goes to town. It's not limited by the number of processor cores, but by the size of the ThreadPool. Nice. But this implementation and the one above share another problem: Even while the tasks are blocked and waiting for data from SDB, it is still holding onto a Thread in the Threadpool. What I really needed was an asynchronous version of the SDB SDK but I don't want to rewrite their entire SDK to do it. After some more research into the Task Parallel Library and its compatibility with the Asynchronous Programming Model, I came up with this implementation:

 var tasks = new Action[] {
    () => { data1 = GetData1FromSDB(); },
    () => { data2 = GetData2FromSDB(); },
    () => { data3 = GetData3FromSDB(); },
    () => { data4 = GetData4FromSDB(); }
 }.Select(a => Task.Factory.FromAsync(a.BeginInvoke, a.EndInvoke, null)).ToArray();
 Task.WaitAll(tasks);
 

See what I did there? I took each data accessor and wrapped it into an Action via a lambda expression. Action's (and Func's) contain built in BeginInvoke and EndInvoke asynchronous versions of their execution. Nice. Now I just needed to batch them up and wait for them all to execute before continuing. I wrapped them into Tasks using Task.Factory.FromAsync and then issued a Task.WaitAll to wait till they finished. I then abstracted the functionality into an extension method:

 public static void InvokeAsyncAndWaitAll(this IEnumerable<Action> actions) {
    Task.WaitAll(actions.Select(a => Task.Factory.FromAsync(a.BeginInvoke, a.EndInvoke, null)).ToArray());
 }
 

I can create multiple extension methods for each type of Action and Func as well (the Func version would need to return some type of collection). The nice thing about this is that all the data access happens asynchronously (though I have no control over when the thread sleeps during the data access), and data access is parallelized as well. Additionally using the Task Parallel Library lets the code look readable and definitely more maintainable than using some type of sync'd/locked counter for checking when all data accesses complete.