Vitaliy Liptchinsky's .NET Framework blog

Wednesday, September 15, 2010

Introduction into dataflow programming

What is dataflow programming all about? In classical imperative programming a program is basically set ofoperations working with mutable state thus effectively hiding data paths. Dataflow programming is more like aseries of workers/robots on an assembly line, who execute only when input material arrives. Imperative programmingstyle can introduce non-determinism in case of concurrent execution (multithreading) without propersynchronization. In dataflow programming program execution depends on the actual data, or on the data availabilityto be precise. Dataflow programming yields completely deterministic programs.

Let’s introduce the concept of dataflow variable which is one of the main concepts of dataflow programming.Dataflow variable can have two possible states: bound (assigned a value) or unbound (no value has been yetassigned). Whenever a thread tries to read the value of unbound dataflow variable it gets blocked until some otherthread bounds the variable. Dataflow variable can be bound only once, successive tries to bind the variable willfail. So, what is dataflow programming? With dataflow variable one can also build blocking queues and streams.Actor model can be implemented using such blocking queues.

Basically, you can get more information on dataflow programming from this Wikipedia article. Also there is nice article in Groovy GPars guide.

Overview of the article

This article presents basic implementations of dataflow variable in both C# and F#. Also article demonstratesexamples of dataflow programming in C# using futures. The best effect of dataflow programming is achieved inprogramming languages that follow declarative model principles. In our case C# is imperative language andprogramming in a dataflow style requires developers to be self-disciplined. Surprisingly, but F# being consideredto be a functional programming language, and therefore following declarative programming paradigm, also enablesdevelopers to program in an imperative programming way (via mutable keyword). Adding dataflow variables to C# andF# does not make them automatically dataflow programming languages, because there is still no necessary syntacticsugar and language support.

Clojure is one of the most popular modern languages that enable dataflow programming. Clojure supports dataflowprogramming through premises. It is also possible to do a dataflow programming in other popular languages likeGroovy, Scala, Ruby using open-source libraries like GPars for Groovy, but all those languages provide no syntacticsupport for dataflow variables. As a genuine dataflow programming language I would distinguish Oz programminglanguage which treats all variables as dataflow variables: reader trying to read an unbound/uninitialized variablewill be blocked until variable is bound/initialized. One one hand it saves us from fameous NullReferenceExceptionexceptions, but on the other hand it can introduce program hangs.

First I will present implementations in C# and F# and later will dig into the thread synchronization details.

Dataflow variables in C#

Let’s start with the simple example of how to use a dataflow variable in C#.

C# is not very extensible when it comes to operator overloading (as you later see in F# implementation) and this isthe reason we are using Bind method here. Actually this is a matter of taste – whether to use operators whenworking with dataflow variables or simply properties/functions, but as per me operators look more naturally. What Ilove about C# is implicit conversion operators.Now the code itself:

static member public (!!) (var:DataflowVariable<'T>) : 'T = var.Value end

You may have noticed [<volatilefield>] attribute. As per pretty stingy documentation this attributeeffectively replaces volatile keyword in C#, but I haven’t performed thorough testing to verify it. What? F# has nokeyword for volatile fields? And this is as it has to be. Volatile fields belong to the domain of imperativeprogramming and F#, being first of all functional programming language (which is implementation of declarativemodel), tries to avoid shared state (remember mutable keyword?). F# does not support overloading of implicitconversion operators, that’s why we need some kind of dereferencing prefix operator (!!).F# implementation is more elegant, because we expose Option type here and thus do not have to deal withisInitialized field as in case of C# implementation.

Implementation details and some thoughts on thread synchronization

For synchronization in both implementations I have used volatile fields in conjunction with a simple pattern forMonitor.Wait/Monitor.Pulse. More information regarding Monitor.Pulse/Monitor.Wait you can get in this very nicearticle by Joe Albahari.Volatile fields here are used to prevent instruction reordering and ensure CPU cache synchronization.Also as an option, instead of using volatile field, we could use here Thread.VolatileRead method (we do not need touse also Thread.VolatileWrite because actual write is done in within the lock statement which effectively preventsreordering and flushes and invalidates CPU cache, and anyway Thread.VolatileWrite only flushes the CPU cache butdoes not invalidate it). Basically, the static VolatileRead and VolatileWrite methods in the Thread classread/write a variable while enforcing (technically, a superset of) the guarantees made by the volatile keyword.

Dataflow programming examples in C# and F#

In C# I will demonstrate a simple example of dataflow prorgamming with Parallel extensions library (futures andcontinuations). Basically using Task.Factory.ContinueWhenAll one can achieve similar results as with dataflowvariables, but dataflow variables provide developers with much more flexibility.

Conclusion

Article describes basic implementation of dataflow variables in C# and F# programming languages and basic examplesof dataflow programming using continuations/futures. Please, consider this article as an starting point in ajourney into the world of dataflow programming.

Thursday, November 5, 2009

Recently I've been playing with pretty complex .NET components written by Vitaliy Liptchinsky and at some point came to the conclusion:

The best .NET debugger in the world is... IronPython console.

Why?

Have you ever tried to continuously play, set up new relationships, create complex objects within Visual Studio? Isn't it boring and cumbersome after each modification re-compile and re-run VS project?

Thursday, March 12, 2009

Never use delayed computations in WCF service contract implementation! Reason: IErrorHandler component is never invoked in this case. If there is any kind of exception within delayed computation, it would be increasingly hard for you to find out the reason even with WCF tracing.Let's have a detailed look how WCF works (this is very simplified version):int --> WCF serializer --> some other staff :) --> try { result = CallYourCustomCode() } catch{ CallToErrorHandler() ... } --> WCF serializer (result) --> outSo, if you have delayed computations created with help of yield C# keyword, CallYourCustomCode returns not the actual result, but kind of reference to your implementation. This reference will be resolved and executed during serialization (!). So, any exception during serialization will close wcf channel, get round of IErrorHandler, and produce sensless exception to the WCF client.

Wednesday, November 26, 2008

What transactional repositories do we know at the moment? Here is a list: SQL Server, MSMQ, file system and registry (in Windows Vista/Windows Server 2008). Is it enough? Does it covers all possible needs of enterprises?At CodeProject I've described custom implementation of transactional repository based on Enterprise Library Caching Application block.Transactional Repository implementation described in article above provides basic principles required for implementation of any custom transactional repository that can easily participate in ambient and explicit transactions in .NET.

Friday, November 14, 2008

I've seen a lot of discussions in the web regarding volatile field. I've performed my own small investigation regarding this subject and here is some thoughts on this:

The two main purposes of c# volatile fields are the following ones:

1. Introduce memory barriers for all access operations to this fields. In order to improve performance CPUs store frequently accessible objects in CPU cache. In case of multi-threaded applications this can cause problems. For instance, imagine situation, when one thread is constantly reading some boolean value (read thread) and another one is responsible for updating this field (write thread). Now, if OS will decide to run these two threads on different CPUs, it is possible, that update thread will change value of the field on CPU1 cache and read thread will continue reading this value from CPU2 cache, in other words, it will get the change of thread1 until CPU1 cache is invalidated. Situation can be even worth if two threads update this value.volatile field introduces memory barriers, which means, that CPU always will read from and write to virtual memory, but not to CPU cache.Nowadays such CPU architectures as x86 and x64 have CPU cache coherency, which means that any change in CPU cache of one processor will be propagated to other CPUs' caches. And, in it's turn, it means that JIT compiler for x86 and x64 platforms makes no difference between volatile and non-volatile fields (except stated in item #2). Also, multicore CPUs usually have two levels of cache: first level is shared between CPU cores and second one is not.But, such CPU architectures as Itanium with weak memory model does not have cache coherency and therefore volatile keyword and memory barriers play significant role while designing multi-threaded application.Therefore, I'd recommend always to use volatile and momemory barriers even for x86 and x64 CPUs, because otherwise you introduce CPU architecture affinitty to your application.

Note: you can also introduce memory barriers by using Thread.VolatileRead/Thread.VolatileWrite (these two method successfully replace volatile keyword), Thread.MemoryBarrier, or even with c# lock keyword etc.

Below are displayed two CPU architectures: Itanium and AMD (Direct connect architecture). As we can see in AMD's Direct Connect architecture all processors are connected with each other, so we have memory coherence. In Itanium architecture CPU are not connected with each other and communicated with RAM through System Bus.

In case if you plan to change myField from separate thread, this significant difference, isn't it?

Usually it is recommended to use lock statement (Monitor.Enter or Monitor.Exit), but if you change only one field within this block, then volatile field will perform significantly better than Monitor class.

Friday, October 10, 2008

.NET Framework BCL contains very nice implementation of thread pool (System.Threading.ThreadPool class).But this class is not suitable for following scenarios:1. Long-running operations. Usually for long-running operations it is recommended to use Thread class.2. ThreadPool is per process.It means, that situation when there is no available threads in ThreadPool can happen pretty often. What if you have very important and emergent work items and do not want rely on such a risk? But pretty often, especially when you have application with number of app domains (like IIS or SQL server) you can run out of threads in thread pool...3. ThreadPool does not support IAsyncResult. BeginInvoke methods of all delegates internally pass control to ThreadPool, but ThreadPool itself does not support IAsyncResult.

It is just initial version of CustomThreadPool and I plan to extend it in future.

Generally, there are number of strict recommendations when you should use ThreadPool and when you should use Thread class directly or MulticastDelegate.BeginInvoke. Ideally I plan to create ThreadPool that would suit for those scenarios applicable for Thread class and BeginInvoke. The main problem of System.Threading.ThreadPool is that it is per process. So if you have set of very important tasks to do and also have set of third-party assemblies that you host in the application, there is always probability that your important tasks will be delayed. In case of CustomThreadPool you have separate threadpool for each application domain.Ok, I know, there is number of maximum threads allowed per process and if there is too much threads, thread switch context can be awfull...

Wednesday, October 8, 2008

So, at first, in order to implement enterprise service using WCF, we need the idea of generic service, which can handle any incoming messages.This blog describes how to handle generic messages in WCF service.This means, that we can serialize any custom serializable object and send it as message to WCF service and service will accept it and will try to process.

But whats next? The question is, how would WCF service know how to process this incoming message.... It can be, for instance business logic related entity or notification about problems in remote component...Yes, and according to title our WCF service needs to be generic. The idea below describes quite simple WCF service that can handle various incoming messages without any recompilation.The flow is the following:1. Retrieve contents of body element2. Based on some criteria (it can be name of the root node plus namespace name), choose XSLT transformation that is appropriate for given message. XSLT transformation file can be stored as local file or in database.3. Using XSLT transformation transform incoming XML to XAML (or even to c# code!).4. Compile resulting XAML.5. Execute compiled code. The resulting code can do anything: from executing database queries to sending e-mail to administrator

Drawbacks of this approach:1. You are not able to debug the resulting code2. It requires a nice tool that would produce XSLT transformation based on given message content and handler (c# code).

Benefits:1. As new messages are introduced in enterprise, you do not have to modify/redeploy the service. All you need, is to provide WCF service with additional scripts.2. It is very flexible, because you are not dependent on any message types, so you theoretically can receive and process anything.

P.S.

Yes, I know, instead of XSLT transformation we can deploy assemblies-handlers for each event and dynamically load these assemblies.

Probably, this approach will never get a chance to be implemented... I just wanted to share interesting idea...

Thursday, June 12, 2008

I believe this is quite common question of all developers, that use databases in applications. I’ve seen a lot mistakes regarding this choice.Is it better to create one static connection and use it thru all the code, or during each database call create new connection object?

The answers are:In case of DB server (stand-alone database like SQL Server Express/Standard/Enterprise or Oracle) it is always better to create and dispose new connection objects, because almost all db drivers (like ADO.NET, ODBC, Oracle) have such feature as connection pooling and you won’t experience benefits of keeping one connection alive. Static connection can even decrease performance, because in multithreading application single connection object cannot be used simultaneously. Also static connections decrease scalability of applications. Usually connection pooling performs better than your custom code that tries to re-use created connections. There is always exceptions from this situation: if you are going to execute number of SQL statements sequentially, it would be mistake to create new connection for each new statement!

In case of embedded databases (like SQL Server CE) it is better to use static connection, because such kind of databases does not have connection pooling and connection re-creation usually costs a lot.

Wednesday, January 2, 2008

Hi,I just want to present you quite interesting approach of ASP.NET utilization.

This approach shows how to render html files using ASP.NET pages and server controls. For instance, we have some amount of read-only data which we want to present to the user with high level of user interactivity and ability to print. Let's consider html documents and embedded browser as ActiveX control. Here we already have pretty good printing capability and also we can provide users with rich interactivity using JavaScript. If we could generate html using compiled .aspx pages it would be the best, because we can edit and create web forms in Visual Studio (and also we can use all ASP.NET powerful controls like DataGrid) and then all we have to do is to produce html using generated ASP.NET page handlers.

Friday, September 21, 2007

In order to precompile method without invoking it you should use System.Runtime.CompilerServices.RuntimeHelpers.PrepareMethod method.

Possible situation when this method is extremely necessary: You need to precompile huge assembly in separate thread during application delays and you can't use ngen.exe.

User always complain about lazy load of all screens in .NET applications. This lazy load is caused by JIT compilation...For instance, you have fat (rich) client application with log-in screen. During log-in process you can pre-compile heavy assemblies in separate thread.