Sunday, July 15, 2012

Design Guidelines: Purely Functional Data Structures

In the previous post, we discussed how we can use private static to make our methods more aligned to the idea of pure functions. Since they don't allow to use instance level data, they provide better multithreading support by avoiding any possible side effect. Hence we don't need any synchronization block to serialize the execution flow for multiple threads. It's not like that we should always be creating our private methods as static but we should definitely be thinking about it. Obviously there are situations when we need instance context.

One such example would be to raise any instance-level event [Remember sender in event args]. Unless we pass the instance itself (this) to the method, this doesn't seem possible. Doing that, again, we are opening ourselves to side effects. Based on this discussion, I think that a better thoughtful combination of static and instance-levelprivate methods should be carefully defined for any design. But following the exercise as followed in the last post should help us limiting them. This combination would also ensure that we keep the length of our private methods to the minimum. Their role can be defined as minimum wage labors hired for the little chores. It doesn't seem appropriate to put the heavy loads on their shoulders. We need to define exactly how they would coordinate their work. We just can't put the responsibility of all labors to one of them and expect harmony. You might have seen people asking about how to test private methods and there are ways to do that (accessors in MSTest). They are great to think about testable heavy logic but the beginning is wrong, this heavy logic doesn't deserve to be there.

A data structure that can only be accessed by pure functions. An instance of immutable type. [Parallel Programming with Microsoft .net - Design Patterns for Decomposition and Coordination on Multicore architectures]

The same book defines immutable data as the data that cannot be modified after it's created while the immutable types are those types whose instances are immutable. On the contrary, mutable data is the data which can be modified afters it's created and the the types whose instances are mutable are called mutable types.

[Brian Goetz] has an amazing article on Developer Works discussing the benefits of Immutable data structures. It's a really old post, and obviously, it discusses in the context of java but a really interesting read. On the Microsoft front Eric Lippert has discussed this.

readonly Collections & ReadOnlyCollection
Turning a type immutable means that the members are marked readonly. As we know that a readonly member cannot be modified after the constructor code finishes execution. After that we cannot change the value of a member of value type. We also can't change the object referenced by reference type member, whether it is a scalar object or a collection. Yes, for collection based members, it would be guaranteed that the same collection would be used throughout the lifetime of the immutable type but there is no guarantee that it would continue to hold the same items. A collection based member of an immutable type can add / remove member without any issue with the sanctity of immutable behavior.

Let's assume that in order to attract more customers, our coffee house is adding the functionality to introduce discount offers. We can introduce a DiscountOffer type as follows:

This seems to be a perfectly good immutable type with its members specified as readonly. This type would be shared by all the consumers of this type. They can see if this offer is for only members. They can show the title of the offer including the details about the duration of the offer. Additionally, they can see what products are included for this offer. Here is the catch, the collection object is non-modifiable but any consumer can add / remove members from the collection which would be weird for a discount offer. Once introduced, we don't want any client to be updating any part of the offer.

Where the constructor of the type can be updated as follows:

Since the products list is a reference type so any updates in the list would reflect in the OfferProducts member. If you don't want that then you can create a deep copy of the list before assigning it to the ReadOnlyCollection member.

Popsicle Immutability [Immutable after freezing]
These are data structures which are modifiable when they are created. They can be turned non-modifiable. In order to turn them so these types provide a special operation. They result in better performance once turned non-modifiable.

WPF has a number of freezable concepts including brushes, pens, transformations, geometries, and animations. Most of them seem to be part of graphics sub-system.
The freezing part of freezable types is interesting. Let me make it easier for you by stating the following:

"All freezable objects cannot be frozen all the time. We must check if it can be frozen before causing it to freeze."

Some part of our code is in the middle of updating the object's properties when some other part of the code wants it to be frozen. We might have it data bound to some UI control. If we freeze it then it wouldn't be able to reflect updates from user interaction with UI resulting in exception situation. WPF has resolved this issue by providing CanFreeze property. In WPF, Freezable(s) keep raising Changed event whenever their state is changed. Once frozen, there is no need to subscribe to that event as the object wouldn't be changing its state afterwards.

Objects of this type are thread safe after they are frozen. For our custom popsically immutable types, we must make sure that the objects are safely shared across different threads. WPF Type System has made this easier for us. Freezable is a DispatcherObject. So, it has an associated Dispatcher until it is frozen. Attempting to access it from a different thread would result in an exception. But if we freeze it, it disassociates itself from the Dispatcher by setting it null. Now it can be shared across threads without worrying about updates in its state.

Like Clone() method from ICloneable interface, types should be clearly documenting the expectations of Freeze() method. In ICloneable case, it might return a deep copy or a shallow copy of the object. We always have to see type's documentation for the expected behavior. This is actually an issue with polymorphism which is taken care of by good documentation. For your types, you can also freeze the object deeply or otherwise but it should be clearly documented. WPF keeps it deeply frozen. That is why it has a CanFreeze property which checks even if any of the sub-object is being used then it cannot be frozen.

You might want to name your method differently (e.g. TurnImmutable()) for your type but the concept should be same. In WPF case, it frees itself from the dispatcher bounds (sets the property to null). In your case, you might want to avoid using synchronization blocks once this turns immutable. This means there wouldn't be any serialized thread access for the instances hence lesser waits for threads and hence it can cause better performance of your applications.

Let's introduce a new type for the purpose of introducing this concept to our project. It's sub-types would be Popsically immutable [I know there is no word like popsically (here I get advantage of being non-native :))]

This type is different than WPF's Freezable on a number of different account. Let's discuss why we are introducing it differently than that.

Calling TurnImmutable() method checks if the object is already turned immutable, if so then an InvalidOperationException is thrown. On the contrary, Freezable.Freeze() just swallows the call. This seems like a deception. If a function is not able to perform its duty, then it should throw exception. The caller must check if it is already frozen / immutable before attempting to turn it so.

CanBeTurnedImmutable is equivalent to Freezable.CanFreeze property. First of all, why we have introduced this as a method instead of a property. Basically this is following Microsoft's recommendation of defining properties. Since the method uses CanBeTurnedImmutableCore virtual method. It can be overridden by a child class and might be expensive operation.

Both Freezable.Freeze() and Freezable.CanFreeze use Freezable.FreezeCore() with a flag parameter [isChecking] to identify if this call is just for checking purpose or actually turning the instance immutable. This flag based approach to drive method behavior should be avoided. This clearly means that the method has more than single responsibility which is never recommended. We have avoided using it like that.

This implementation also has some limitations. They are as follows:

This does not support change notification. Freezable supports change notifications through events. It would continue to raise the events until the object turns unmodifiable. After that there cannot be any such event possible. Since this is not a necessary condition for immutability, let's leave that here. You can provide the event if you need so.

The type is not thread safe. What!!! Don't shoot me for that. Let's do it brute force.

Where _lockObj can simply be as follows:

You might argue if we really need a lock statement in CanBeTurnedImmutable() method. But, avoiding that, you can easily turn yourself into situation when a thread is in the middle of the execution [after having passed the if statement]. During this time, another thread causes the execution of TurnImmutable() method. The first thread obviously get true from the method but when it calls the TurnImmutable() it is slapped with an exception.

There might be other cases when multi-threaded execution takes place as follows:

After the first thread passes the if statement and it hasn't entered TurnImmutable() call yet. The other thread enters and also passes through the if statement. They both call TurnImmutable(). One thread would be successful while the other would get the exception. Now when you see this you might realize while Freezable swallows the call in this case. Well, this is a trade-off. Make your decision based on your requirements. But you should understand the gains and pains of both approaches.

Here we have introduced the ...Core() methods as virtual. You might want to turn them into abstract methods.

In order to turn your data structures into a popsically immutable data structure, you would need to inherit from the type defined above. You would need to provide definition of virtual / abstract methods. You would also need to check all the attempts to modify the object and throw an exception if the object has been turned immutable already.

Observational Immutability
This is the type of immutability where object is immutable for the outside. All the public fields and properties cannot be set using type's instance. They can only be modified from within. Basically the setter for the property is not public. It can be private or protected based on the placement of class in the inheritence hierarchy (if there is any).

Observational immutability reduces the possibilities of updating the state of object. Now they can only be updated by the outside world using public behaviors (methods) of the object. In our Coffee App example, we can introduce the features to have the details about customers. We can assign a CoffeeCard to a customer. If a new card is assigned to a customer then the previous card just gets overridden.

In this example, the setter for Card is kept private so it cannot be updated by the consumer of the object of type Customer. But from within the class, it can get overridden. It's not necessary that only object's interfaces (public methods) cause it to be updated. It can also be updated by any other event from some other object that the Customer has subscription. But it is to make sure that all these updates are caused from within the definition of the type itself.

You might argue if this should even be called as immutable as the object can clearly update its state after being created.

It must be remembered that Observationally immutable types are not thread safe if there is no special synchronization measures taken for thread safety. Two threads might enter AssignCard method. Now the final value of card would depend on which one of the thread assigns it last. Also the "Other special things" would depend on the Card at the instance when the statement is being executed.

Immutable Objects are Value Objects
As Martin Fowler suggests, Value Objects are objects that follow value semantics instead of reference semantics. Two such objects are equal if all of their fields are equal.Immutable types can also be implemented as value objects. They seem to allow manipulation of object but every manipulation result in a new object. One such example is string based objects. string seem to allow manipulations but any such manipulation result in created of a new string object. The same is true for all Value Objects.

Value Objects support structural equality. Two value objects are equal if all their members are equal. In C#, value objects are also defined using classes. That is why we need to provide overloads of equality operator i.e. == and Equals method to work on checking equality based on the member's values. This is because the identity of an immutable type is its state and not its reference e.g. if firstInteger and secondInteger are both assigned the value "2", then they are equal. On the other hand reference objects are only equal if they refer to the object created using the same new operator with no consideration whatsoever about the state of its members.

Value Objects are also side effect free types. Hence they are the ideal candidates for functional and multi-threaded code.

Basically people have a lot of confusion when they are first learning the object oriented language because of context specific terminologies. So the value / reference types and passing by reference and value are different context specific terminologies.

"It is possible to pass a value type object by reference and vice versa. "

The examples of passing by value and by reference in WPF are StaticResourceExtension and DynamicResourceExtension. StaticResourceExtension uses resource as passed by value that is why when the other reference starts using some other resource, this still keep referencing to the older object. On the contrary, DynamicResourceExtension uses them by reference, so if the reference gets updated to refer to a new object, the extension start using the newer instance. Here I have deliberately used "reference REFER", using "reference points" gets the same picture of reference in our mind as being a pointer which creates dirty desires to use it like one which is not possible. Abstinence is the key :) [See Erik Lippert]