Version numbers in the traditional sense were obsoleted ever since DVCS arrived on the scene. They’re still useful for literal human consumption but I prefer having the full Git (or Hg) changeset hash available on the assembly as well so I can literally do CTRL+SHIFT+G on GitExtensions (or the equiv. on TortoiseHg etc) and paste in the changeset hash to go directly to it.

To set this up is fairly simple. Firstly, ensure your Git.exe is in your PATH. It generally will be if you installed Git with default/recommended settings. Second, import the NuGet package for MSBuild Community Tasks. Make sure you import the older version available here as there is a bug in the newer versions where the GitVersion task is broken.

Create a VersionAssemblyInfo.cs in your solution root, and create a “Link” to this file in all your projects.

Then in each project file you just need to add a new import, typically underneath the “Microsoft.CSharp.targets” import.

<Import Project="$(SolutionDir)\VersionAssemblyInfo.targets" />

Now rebuild your project(s). If you check the disassembly with a tool like JustDecompile or Reflector and look at the assembly attributes you should see that (little known and rarely used) AssemblyInformationalVersionAttribute is there and contains both the human-readable major.minor.build.revision as well as the Git changeset hash.

You may now choose to replace the various areas of your application that display version information with this attribute instead.

37signals’ Campfire product is a brilliant example of poor web application design. I would much rather use IRC and forfeit the tiny number of features in Campfire that are actually useful such as image in-lining and arguably Youtube clip in-lining too (though this one is rarely used for actual work). I can’t actually remember the last time we actually used the log history/transcripts feature.

Some of the problems and annoyances we have with Campfire are:

Switching between rooms requires a full page refresh and is very slow.

Despite the illusion of fake tabs, you can only be in one room at a time. Unlike an IRC client these fake tabs provide no indication that room activity has occurred. So you must periodically switch between rooms to check for activity. If you need to be in multiple rooms for real then you’ll have to open more tabs in your actual web browser.

Timestamps for chat messages seem to be grouped together (good in theory) but they only appear after some random amount of time (bad implementation in practice).

There is no built-in notification/pop-up support. So you have to install hacks like the Kindling for Chrome extension.

A third of the screen real estate is occupied by pointless stuff that you rarely use. Yes having a user list is important, but having it visible at all times is not.

All of the “Campfire clients” for Windows are rubbish.

But the biggest annoyance of all is when you blindly start typing a message and the web application is too dumb to auto-focus the cursor on the message entry text box. And you can’t simply and predictably Tab-key your way to it easily. The only constant-time way to reach it is to focus it with the mouse. Bear in mind that this is a feature that every IM client since about 1999 has had baked in.

Is it time for 37signals to drink some of their own Kool-Aid and actually “rework” their Campfire service?

In the Registry Editor, navigate to: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v4.0.30319\SKUs\ and create a new Key folder called .NETFramework,Version=v4.0,Profile=Mono

Now navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\.NETFramework\v4.0.30319\SKUs\ and create the same Key folder again here (this step is only necessary for x64 machines, but since all software developers use those then you’ll probably need to do it!)

Now load up VS2012 and a project. Goto the Project Properties (Alt+Enter whilst having it selected on Solution Explorer) .

Choose the Application tab on the left and look for the “Target framework” drop-down list.

On this list you should see an entry called “Mono 2.10.9 Profile”.

Select it, and Visual Studio should then convert your project to target the Mono framework. You’ll notice that it will re-reference all your various System assemblies and if you study the filenames they will point to the Mono ones that were created during Step #3.

Note: I was scratching my head at first as I kept getting an error from Visual Studio saying:

This application requires one of the following versions of the .NET Framework: .NETFramework,Version=v4.0,Profile=Mono

Do you want to install this .NET Framework version now?

Turns out that even on a x64 machine you MUST add both Registry key SKUs (see Steps #7 and #8). It is not enough to just add the Wow6432Node key, you must add the other as well. I can only assume this is because VS2012 is a 32-bit application. But maybe it’s also affected by whether you’re compiling to Any CPU, x64 or x86… who knows. It doesn’t really matter as long as this fixes it, which it does!

Building and executing your first program

Now that your development environment is nicely setup, you can proceed and build your first program.

The Mono equivalent of MSBuild is called XBuild (not sure why they didn’t call it MBuild or something!). You can build your .csproj by doing the following:

Load the Mono Command Prompt (it will be on your Start Menu/Screen, just search for “mono”).

Change directory to your project folder.

Execute the following command to build your project using the Mono compiler:$ xbuild /p:TargetFrameworkProfile=""Note: You must specify the blank TargetFrameworkProfile parameter as otherwise the compiler will issue warnings along the lines of:

Unable to find framework corresponding to the target framework moniker ‘.NETFramework,Version=v4.0,Profile=Mono’. Framework assembly references will be resolved from the GAC, which might not be the intended behavior.

Hopefully you’ll not have any errors from the build…

Now you can run your program using the Mono runtime, to do this:$ mono --gc=sgen "bin\Debug\helloworld.exe"Note: You'll definitely want to use the "Sgen" garbage collector (hence the parameter) as the default one in Mono is unbearably slow.

You can do a quick “smoke test” to verify everything is in order with both your compilation and execution. Have your program execute something like:Console.Write(typeof (Console).Assembly.CodeBase);
… and this should print out a path similar to:file:///C:/PROGRA~2/MONO-2~1.9/lib/mono/4.0/mscorlib.dll
I’ve no idea why it prints it out using 8.3 filename format, but there you go! You’ll notice that if you run your program outside of the Mono runtime then it will pick up the Microsoft CLR version from the GAC.

When developing and testing a distributed system, one of the essential concerns you will deal with is eventual consistency (EC). There areplenty of articles covering EC so I’m not going to dwell on that much here. What I am going to talk about is testing, particularly of the automated kind.

Testing an eventually consistent system is difficult because everything is transient and unpredictable. Unless you tune your consistency values like N and DW then you’re offered no guarantees about the extent of propagation of your commit around the cluster. And whilst consistency tuning may be acceptable for some tests, it most definitely won’t be acceptable for tests that cover concurrent write concerns such as your sibling resolution protocols.

What is “partition tolerance”?

This is where a single node or a group of nodes in the cluster become segregated from the rest of the cluster. I liken it to a “net split” on an IRC network. The system continues operating but it has been split into two or more parts. When a node has become segregated from the rest of the cluster it does not necessarily mean that its clients can also not reach it. Therefore all nodes can continue to perform writes on the dataset.

Generally speaking, a partition event is a transient situation. They may last a few seconds, a few minutes, hours.., days.. or even weeks. But the expectation is that eventually the partition will be repaired and the cluster returned to full health.

Fig. 1: State changes of a cluster during a partition event

In the diagram (Fig. 1) there is a cluster comprised of three nodes, and this is the sequence of events:

N2 becomes network partitioned from N1 and N3.
N1 and N3 can continue replicating between one another.

Client(s) connected to either N1 or N3 perform two writes (indicated by the green and orange).

Client(s) connected to N2 perform three writes (purple, red, blue).

When the network partition is resolved, the cluster begins to heal by merging the commits between the nodes.

The yellow and green commits (that were already replicated between N1 and N3) are propagated onto N2.

The purple, red and blue commits on N2 are propagated onto N1 and N3.

The cluster is now fully healed.

Simulating a partition

I have a Riak cluster of three nodes running on Windows Azure and I needed some way to deterministically simulate a network partition scenario. The solution that I came up with was quite simple. I basically wrote some iptables scripts that temporarily firewalled certain IP traffic on the LAN in order to prevent the selected node from communicating with any other nodes.

To erect the partition:

# Simulates a network partition scenario by blocking all TCP LAN traffic for a node.
# This will prevent the node from talking to other nodes in the cluster.
# The rules are not persisted, so a restart of the iptables service (or indeed the whole box) will reset things to normal.
# First add special exemption to allow loopback traffic on the LAN interface.
# Without this, riak-admin gets confused and thinks the local node is down when it isn't.
sudo iptables -I OUTPUT -p tcp -d $(hostname -i) -j ACCEPT
sudo iptables -I INPUT -p tcp -s $(hostname -i) -j ACCEPT
# Now block all other LAN traffic.
sudo iptables -I OUTPUT 2 -p tcp -d 10.0.0.0/8 -j REJECT
sudo iptables -I INPUT 2 -p tcp -s 10.0.0.0/8 -j REJECT

To tear down the partition:

# Restarts the iptables service, thereby resetting any temporary rules that were applied to it.
sudo service iptables restart

Disclaimer: These are just rough shell scripts designed for use on a test bed development environment.

I then came up with a fairly neat little class that wraps up the concern of a network partition:

RingStatus.AssertOkay("riak03");
using (NetworkPartition.Create("riak03")) {
RingStatus.AssertDegraded("riak03");
// TODO: Do other stuff here whilst riak03 is network partitioned from the rest of the cluster.
}
RingReady.Wait("riak03");
RingStatus.AssertOkay("riak03");

Obviously as part of this work I had to write a quick and dirty wrapper around the PuTTYplink.exe tool. This tool is basically a CLI version of PuTTY and it is very good for scripting automated tasks.

My solution could be improved to support creating partitions that consist of more than just one node, but for me the added value of this would be very small at this stage. Maybe I will add support for this later when I get into stress testing territory!

Source

You can view it over on GitHub; the namespace is tentatively called “ClusterTools”. Bear in mind it’s currently held inside another project but if there’s enough interest I will make it standalone. There has been talk on the CorrugatedIron project (which is a Riak client library for .NET) about starting up a Contrib library, so maybe this could be one its first contributions.

Recently I’ve been working on some interesting projects involving the eventually consistent Riak key-value database. Today I encountered a puzzling issue with a fresh cluster I was deploying. I had seemingly done everything identically to in the past except something was causing it to fail to startup when in service mode.

[administrator@riak01 ~]$ sudo service riak start
Starting Riak: Riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.
[FAILED]

Bizarrely, in console mode it would start up fine. This led to me to believe it was some sort of user or permissions issue but I wasn’t totally sure. Perhaps I had accidentally executed Riak as another user and some sort of locking file was created? Or was it perhaps a performance issue with the “Shared Core” Azure VM’s I was using.

First I followed Riak’s advice by increasing the WAIT_FOR_ERLANG environment variable, I tried first 30 and then 60 seconds. But this made no difference at all. I’m not even sure if Riak was even using my new value as it still kept on printing out “15 seconds” as its reason for failing to start.

I researched some more and many places on the interweb were suggesting to purge the /var/lib/riak/ring/ directory (don’t do this if you have valuable data stored on the Riak instance). I tried this, but it also had no effect.

But it turned out that the solution was incredibly simple. Riak had created some sort of temporary lock directory at /tmp/riak. All I had to do was delete this directory and, hey presto, Riak would now start perfectly fine as a service!

Any time I write a C# property these days I can’t help thinking to myself how they would be so much cleaner if they supported a (syntactically restricted) form of lambda expression on both the getter and setter.

Buoyed with enthusiasm of WebSockets, I set about implementing a simple test harness of a WebSockets server in C#.NET using System.Net.HttpListener. Unfortunately, things did not go well. It turns out that HttpListener (and indeed, the underlying HTTP Server API a.k.a. http.sys) cannot be used at all to develop a WebSockets server on current versions of Windows. The http.sys is simply too strict with its policing of what it believes to be correct HTTP protocol.

The current technical issue for our stack is that the low-level Windows HTTP driver that handles incoming HTTP request (http.sys) does not recognize the current websockets format as having a valid entity body. As you have noted, the lack of a content length header means that http.sys does not make the nonce bytes in the entity available to any upstack callers. That’s part of the work we will need to do to build websockets support into http.sys. Basically we need to tweak http.sys to recognize what is really a non-HTTP request, as an HTTP request.

Implementation-wise this boils down to how strictly a server-side HTTP listener interprets incoming requests as HTTP. For example a server stack that instead treats port 80 as a TCP/IP socket as opposed to an HTTP endpoint can readily do whatever it wants with the websockets initiation request.

For our server-side HTTP stack we do plan to make the necessary changes to support websockets since we want IIS and ASP.NET to handle websockets workloads in the future. We have folks keeping an eye on the websockets spec as it progresses and we do plan to make whatever changes are necessary in the future.

This is a damn shame. As it stands right now, Server 2008/R2 boxes cannot host WebSockets. At least, not whilst sharing ports 80 and 443 with IIS web server. Because, sure, you could always write your WebSocket server to bind to those ports with a raw TCP socket and rewrite a ton of boilerplate HTTP code that http.sys can already do, and then put up with the fact that you can’t share the port with IIS on the same box. This is something that most people, me included, do not want to do.

Obviously it isn’t really anyone’s fault because back in the development time frame of Windows 7 and Server 2008/R2 (between 2006 to 2009) they could not have foreseen the WebSockets standard and the impact it might have on the design of APIs for HTTP servers.

The good thing is that Windows 8 Developer Preview seems to have this covered. According to the MSDN documentation, the HTTP Server API’s HttpSendHttpResponse function supports a new special flag called HTTP_SEND_RESPONSE_FLAG_OPAQUE that seems to suggest it will put the HTTP session into a sort of “dumb mode” whereby you can pass-thru pretty much whatever you want and http.sys won’t interfere:

HTTP_SEND_RESPONSE_FLAG_OPAQUE

Specifies that the request/response is not HTTP compliant and all subsequent bytes should be treated as entity-body. Applications specify this flag when it is accepting a Web Socket upgrade request and informing HTTP.sys to treat the connection data as opaque data.

This flag is only allowed when the StatusCode member of pHttpResponse is 101, switching protocols. HttpSendHttpResponse returns ERROR_INVALID_PARAMETER for all other HTTP response types if this flag is used.

Windows Developer Preview and later: This flag is supported.

Aside from the new System.Net.WebSockets namespace in .NET 4.5, there are also clear indications of this behaviour being exposed in the HttpListener of .NET 4.5 through a new HttpListenerContext.AcceptWebSocketAsync() method. The preliminary documentation seems to suggest that this method will support 2008 R2 and Windows 7. But this is almost certainly a misprint because I have inspected these areas of the .NET 4.5 libraries using Reflector and it is very clear that this is not the case:

The HttpListenerContext.AcceptWebSocketAsync() method directly calls into System.Net.WebSockets.WebSocketHelper (static class) which has a corresponding AcceptWebSocketAsync() method of its own. This method will then call a sanity check method tellingly named EnsureHttpSysSupportsWebSockets() which evaluates an expression containing the words “ComNetOS.IsWin8orLater“. I need say no more.

It seems clear now that Microsoft has chosen not to back port this minor HTTP Server API improvement to Server 2008 R2 / Windows 7. So now we must all hope that Windows Server 8 runs a beta program in tandem with the Windows 8 client, and launches within a month of each other. Otherwise Metro app development possibilities are going to be severely limited whilst we all wait for the Windows Server product to host our WebSocket server applications! Even still, it’s a shame that Server 2008 R2 won’t ever be able to host WebSockets.

It will be interesting if Willie Tarreau (of HA-Proxy fame) will come up with some enhancements in his project that might benefit those determined enough to still want to host (albeit, raw TCP-based) WebSockets on Server 2008 R2.

On 8th December 2011, a little known (but growing in awareness) standard called “WebSockets” was upgraded to W3C Candidate Recommendation status. That is one small step shy of becoming a fully ratified web standard. And just to remove any remaining possible doubt: Next-gen platforms and frameworks such as Windows 8 and .NET 4.5 (at three levels: System.Net, WCF and ASP.NET) already have deeply nested support, and they aren’t even beta yet!

After reading up about the standard in detail and absorbing the various online discussion around it, it is becoming increasingly clear to me that this standard is going to steal a large chunk of mind share from RESTful web services. What I mean is that there will come a stage in product development where somebody will have to ask the question:

Right guys, shall we use WebSockets or REST for this project?

I expect that WebSockets will, within a year or two, begin stunting the growth of RESTful web services – at least as we know them today.

What are WebSockets?

They are an overdue and rather elegant protocol extension for HTTP 1.1 that allows what is fundamentally a bi-directional TCP data stream to be tunnelled over a HTTP session. They provide a built-in implementation of TCP message framing, so developers don’t need to worry about any boilerplate code stuff like that when designing their application protocol.

Why are WebSockets a threat to RESTful web services?

From the last few years of working on projects that expose RESTful web services, I have noticed a few shortcomings. I should probably make clear that I’m not claiming that WebSockets answers all those shortcomings. I’m merely suggesting that REST is not the silver bullet solution that it is often hyped up to be. What I am saying is that there is definitely space for another player that can still operate at “web scale”. WebSockets have more scope to be a little more like a black box or quick’n’dirty solution than REST which requires a more design up-front approach due to versioning and public visibility concerns. Always use the correct tool for the job, as they say.

Sub-par frameworks

They might claim REST support but they still haven’t truly “groked” it yet, in my opinion. WCF REST is a good example of this. Admittedly, the WCF Web API for .NET is starting to get close to where things should be, but it is not yet production ready.

Perhaps even more serious is the lack of widespread cross-platform RESTful clients that work in the way that Roy prescribed; of presenting an entry point resource that allows the client to automatically discover and autonomously navigate between further nested resources in a sort of state machine fashion. A single client framework that can operate with hundreds of totally different RESTful web services from different organisations. This does not exist yet, today. This is why so many big providers of RESTful web services end up seeding their own open source projects in various programming languages to provide the essential REST client.

Enterprise loves SOAP (and other RPCs)

Third-parties that want to use your web services often prefer SOAP over REST. Many haven’t even heard of REST! WebSockets are a message-based protocol allowing for SOAP-like RPC protocols that enterprise seem to adore so much. Hell there’s nothing stopping actual SOAP envelopes being transferred over a WebSocket!

This might not be the case if you’re operating in an extremely leading edge market such as perhaps cloud computing where everyone is speaking the same awesomesauce.

Complex domain models

Mapping out complex domain models onto REST can be slow and labourious. You’ll find yourself constantly having to work around its architectural constraints. Transactions, for example, can be a particular problem. Of course, this is partly related to the first problem (sub-par frameworks) but one cannot reasonably expect transaction support in a REST framework. What is probably needed is a set of common design patterns for mapping domain models to REST. And then an extension library for the framework that provides reusable implementations of those patterns. But alas, none of this exists yet.

Text-based formats

JSON/XML (for reasons unknown) are commonly used with REST and these are of course text-based formats. This is great for interoperability and cross-platform characteristics. But it is not so great for memory and bandwidth usage. This especially has implications on mobile devices.

You’ll find yourself running into walls if you try to use something that isn’t JSON or XML, at least that is my experience with current frameworks.

Request-response architecture of HTTP

Fundamentally, REST is nothing more than a re-specification of the way HTTP works and a proposal of a design pattern to build applications on top of HTTP. This means it retains the same statelessness and sessionless characteristics of HTTP. It therefore precludes REST from being bi-directional where the server could act as the requester of some resource from the client, or sender of some message to the client. As a result it requires “hacks” to be used to emulate server-side events, and these hacks have bad characteristics such as high latency (round trip time) and are wasteful of battery life.

Public visibility, versioning concerns

Sometimes having everything publicly visible is not what you want. People start using APIs that you don’t want them to use yet. You have to design everything to the nth degree much more. Have a proper versioning strategy in place. It encourages a more discerning approach to software development, that is for sure. Whilst these are usually good things, they can be a hindrance on early stage “lean agile” projects.

What can WebSockets do that is so amazing?

The fact that there will soon be a second player in this space suggests that there will be rebalancing of use-cases. WebSockets will prove to be disruptive for several reasons:

True bi-directional capability and server-side events, no hacks

Comet, push technology, long-polling etc in web apps are slow, inefficient, inelegant and have a higher potential magnitude for unreliability. They often work by requesting a resource from the server, causing the server to block until such a time that an event (or events) need to be transferred back to the client. They can be unreliable because the TCP connection could be teared down by a intermediate router during the time it is waiting for the response. Or worse, a proxy server might deliberately time out the long-running request. As such, many implementations of this hack will use some kind of self-timeout mechanism so that perhaps every 60 seconds they will reissue the request to the server anyway. This has implications on both bandwidth and battery usage.

The true bi-directional capability offered by WebSockets is a first for any HTTP-borne protocol. It is something that neither SOAP nor REST have. And which Comet/push/long-polling can only emulate, inefficiently. The bi-directional capability is inherently so good that you could tunnel a real-time TCP protocol such as Remote Desktop or VNC over a WebSocket, if you wanted.

Great firewall penetration characteristics

WebSockets can tunnel out of heavily firewalled or proxied environments far easier than many other RPC designs. I’m sure I’m not alone in observing that enterprise environments rarely operate their SOAP services on port 80 or 443.

If you can access the web on port 80 without a proxy, WebSockets will work.

If you can access the web on port 80 with a proxy, WebSockets should work as long as the proxy software isn’t in the 1% that are broken and incompatible.

If you can access the web on port 443 with or without a proxy, WebSockets will work.

I strongly suspect that there will be a whole raft of new Logmein/Remote Desktop and VPN solutions that are built on top of WebSockets, purely because of the great tunnelling characteristics.

Lightweight application protocols and TCP tunnelling

There is the potential for extremely lightweight application protocols, in respect of performance, bandwidth and battery usage. Like REST, the application schema/protocol isn’t defined by the standard; it is left completely wide open. WebSockets can transfer either text strings or binary data. It is clear that the text string support was included to aid in transferring JSON messages to JavaScript engines which lack the concept of byte arrays. Whilst the binary support will be most useful tunnelling TCP streams or for custom RPC implementations. After a WebSocket session is established, the overhead per message can be as small as just two bytes (!). Compare that to REST which has a huge HTTP header to attach to every single request and response.

How will the use-cases of REST change?

I believe that REST will lose a certain degree of its lustre. Project teams will less eagerly adopt it if they can get away with a bare bones WebSocket implementation. REST will probably remain the default choice for projects that need highly visible and cross-platform interoperable web services.

Projects without those requirements will probably opt for WebSockets instead and either run JSON over it, or use a bespoke wire protocol. They will particularly be used by web and mobile applications for their back-end communications i.e. data retrieval and push-events. Windows 8 “Metro” applications will need to use them extensively.

I suppose you could summarise that as:

REST will be (and remain) popular for publicly visible interfaces.

WebSockets will be popular for private, internal or “limited eyes only” interfaces.

Note: By “public” and “private” I am not referring literally to some form of paid/subscription/membership web service. I am referring to the programming API contract and its level of exposure to eyes outside of your development team or company.

Conclusion

Even though they are competing, the good thing is that REST and WebSockets can actually co-exist with one another. In fact, because they are both built upon HTTP fundamentals they will actually complement each other. A RESTful hypermedia resource could actually refer to a WebSocket as though it were another resource through a ws:// URL. This will pave the way for new RESTful design patterns and framework features. It will allow REST to remedy some of its shortcomings, such as with transaction support; because a WebSocket session could act as the transactional unit of work.

For the past 8 years of my life I have been engrossed in the development of fully automated applications that use two-way SMS text messaging as their communication layer. SMS started life as being nothing more than what was basically the “ICMP protocol” of GSM networks. It used to be fairly hidden away in the menus of those early Nokia phones. And even then it was very much akin to sending a “ICMP ping” message to your friend, and then he pinged you back. I guess that’s where modern services like “PingChat” got their name!

SMS is a very simple protocol; there is only three essential things you need to understand:

It is limited to 160 characters per message, if you use the GSM 03.38 7-bit character set.

It is limited to 70 characters per message, if you use the UCS-2 (a.k.a. Unicode, UTF-16) character set.

Multiple messages can be joined together to form a multi-part message by including a special concatenation header, which eats up 6 or 12 characters (depending on whether you’re using GSM or UCS-2 character set). Most phones these days refer to this concept on their GUI as “pages”.

Unfortunately the protocol is severely handicapped for when it comes to building automated two-way applications, and here’s why:

It does not provide any facility, not even an extension standard or extension point, for performing reply correlation.

What do I mean by “reply correlation”? It is a simple concept. Assume that you send a question to a buddy, and then he responds to you with the answer. One might hope expect that the message containing the answer contains some sort of ID code, token or cookie (hidden away in its header information, of course) that relates it to the original question message. Unfortunately, it does not and this is the problem; SMS does not include any such ID/token/cookie, anywhere. It simply wasn’t included in neither the original standard nor any subsequent revisions or extensions of the standard.

It is not necessarily the creators fault because clearly they couldn’t foresee how ubiquitous SMS would become. But there is evidence that they did recognise and respond to its popularity in the late 1990’s very quickly by publishing new standard extensions that built upon SMS, such as multi-part messages and WAP. So one can only wonder why they didn’t make an extension that would allow replies to be correlated with their original message. And unfortunately the window of opportunity to actually get this sorted out was over a decade ago, so we’re pretty much screwed then and will have to make do with it.

This is a big problem for SMS. It makes the process of building two-way fully automated applications much more difficult. Very very few companies have actually managed the solve the problem, and those that have tend to be very small or operating in niché markets. I don’t understand this at all, because the possibilities and prospects for building two-way SMS applications are absolutely huge, almost endless.

One of my key responsibilities over the last 8 years has been in devising production-ready solutions that work around this problem and this blog post is going to summarise all of them.

An overview of the solution

The key to solving this problem lies with two fields contained within the header information of every SMS message: the source and destination address. Or what I call the “address pairing”.

By using the address pairing in an intelligent way we can find the right compromise for a particular two-way application. Essentially, whenever the application needs to send a question to a mobile phone number it must ensure that no existing question is already outstanding on the same address pairing.

There are several ways that an application can be designed around this basic concept.

Solution #1: My application only has one source address

The application must be designed to “serialise” the transmission of questions. It can use a mutual exclusion mechanism that will prevent itself sending a further question to the same mobile phone number if an outstanding question is still waiting for a response. It can be expected that some characteristics of a “transaction” or “transactional unit of work” would be adopted in the design of the application to model this mutual exclusion concept.

My past implementations of this pattern were based on a database table with a composite primary key between both the “source” and “destination” columns. The application would try to insert the address pairing into this table and, if successful, it continues sending the message. But if the insertion were to fail then it would realise that a question is already outstanding with the mobile phone number, prompting it to give up and retry later. Or rather than retrying later based on some timer mechanism, you might enqueue it as a job somewhere; so that when a response for the outstanding question is received the application can check the queue for further jobs for that address pairing and dequeue/execute the top job.

There is a caveat with this solution however, and it comes as a side affect of “serialising” the questions one after the other. What if it takes days or weeks for the person to respond to the question? The questions that are queued up waiting to acquire a lock on the address pairing are going to get pushed back and back. They could get pushed back so far that the premise of the question has been entirely voided (e.g. an appointment reminder/confirmation).

The solution to this problem is to introduce a further concept of a “timeout” value. This will ensure that any question sent to the mobile phone can only be outstanding for up to a designated time period. You would probably typically set this to around 24-48 hours, but some questions that contain more time sensitive content may use a lower value of between 1-4 hours.

It is important (though not essential) that when implementing the timeout value concept that you use the “Validity Period” field that is available in every outbound SMS message. You should set the validity period to roughly match what your timeout value for that question will be. This will help ensure messaging integrity in the event that, for example, the mobile phone is turned off for a week and when it is turned back on then you don’t want your “expired” questions to be delivered when your back-end application has already timed out the workflow that was running for that question.

Solution #2: My application can have multiple source addresses

The idea is that you would have a relatively large pool of source addresses, perhaps as many as 50 or 100. Your application would, as with Solution #1, maintain some kind of database table or data structure that prevents duplicate address pairings. The application would then have some logic that enables it to “select” a free source address i.e. a source address that is not “in use” for the destination mobile phone number.

It would still be advisable to implement some kind of “timeout” mechanism, as with Solution #1, but the advantage would be that you would be able to have substantially greater timeout periods. Possibly in the order of weeks or months. Really the timeout mechanism here would be acting more as a type of garbage collector, than as a question expedite governor as in Solution #1.

I’ve always considered that this solution is better suited to applications that provide a “shared” or cloud service of some kind. Simply because setting up a large pool of dedicated source addresses for each of your application’s customers is surely going to get painful.

This solution does have the disadvantage that end-users on their mobile phone will be communicating, potentially, with lots of different source addresses even though it is really the same company/application at the other end. It can mess up the user’s normal “texting” experience, it would rob them of their iPhone’s “bubble chat” GUI style of presentation and the ability of perhaps creating a Contact list entry for a regular contact. Obviously there are things you can do to try to minimise this risk, such as always trying to select the source address with the lowest index. But really I think that will just make things worse. At some point you WILL want to send multiple questions to a mobile phone number, and there’s no getting around that fact. If you’ve got a large pool of source addresses then you’re going to want to use them.

Solution #3: My application only has one source address, but I need to send concurrent questions to the same mobile phone

You can’t. Well you can, but I don’t recommend it at all. I tried it once, on an early version of our system, and our customers didn’t like it.

Essentially you combine the concepts detailed in Solution #1 and then rely on some text processing logic in your response handling code. So rather than perhaps phrasing your question like “Are you attending the meeting tomorrow? Reply with A=Yes or B=No.” You’d phrase it as “… Reply with A1=Yes or B1=No”. Notice the “1” digit in there? That’s the key bit. That digit refers to a transaction code that will be used for correlation. My implementation of this basically went from zero to nine, so you could have a total of 10 concurrent questions open with the same mobile number.

I don’t like this solution for the following reasons:

Many end-users forget to include the essential digit in their reply. They might reply “A” instead of “A1”. I’ve seen this happen in the wild.

Accessing digits on mobile phones when typing a SMS message is often an unintuitive process. Even an iPhone needs you to access a sub-keyboard screen. Blackberry’s need you to hit the ALT key.

It prohibits your application from accepting literal text responses. Many users would simply reply “Yes” rather than “A” or “A1”. If they do this, your application would be screwed because it wouldn’t have the essential digit to correlate the reply with the original question. I’ve seen this happen in the wild.

It prohibits your application from accepting “freeform” text responses. You might want to send a question like “What is your full name?”. There’s no way you can tag on the end of that a list of options. It simply doesn’t make sense.

It reveals implementation details onto the user interface of your application. Not good.

It requires both the “reply analysis/text processing” and “reply correlation” concerns of your application to be interdependent on each other, when really they should not be – at least not to perform something so simple.

Experimental solutions

On the last bullet point of Solution #3 I suggested that your application’s “reply analysis” and “reply correlation” concerns shouldn’t be linked together. This I believe is true for something as simple as what was described in that solution. However, there is plenty of mileage to be explored in adopting this approach for more advanced designs.

When you send a question with a constrained set of response options such as “Yes, No, Maybe”, you might want to record these as part of your address pairing in the database or data structure (as described in Solution #1). Then if you need to send a further question (to the same mobile phone, whilst the first question is still outstanding) you can check if the set of response options are different. This question might be looking for a “Good, Bad, Ugly” response. In which case there is no conflict, is there? So a lock on the address pairing, based upon those expected response options, can be allowed to be acquired. Obviously this wouldn’t be possible (or at least would have ramifications on your overall design) if you were expecting a “freeform” response.

Another possible avenue to be explored is an area of computer science called “natural language processing“. The idea is that when you ask a question like “What is your name?” then you would prime your NLP engine to be expecting a reply that looks like somebody’s name. Anything that arrives from that mobile phone that doesn’t look like a person’s name can be assumed to not be related to the outstanding question. Obviously if you want to ask a concurrent question like “What is your wife’s name?” then you’re back to square one. Because that would be a conflict and you’d need to serialise the questions as described in Solution #1. This (NLP and SMS applications) is an active area of research for me, so I may blog about it in more detail at a later time.

Conclusion

Solution #1 is the best, for now. It strikes the right level of compromise without sacrificing neither messaging integrity nor user friendliness. If you desperately need to send multiple concurrent questions to a mobile phone then I would suggest that you should rethink your approach. Perhaps logically separating your business departments and/or workflow concerns onto different source addresses would be a solution in this case. That way you can send out an urgent question, perhaps relating a missed bill payment, on a source address that is dedicated for that purpose.

Solution #2 is usable, and I can think of several use-cases. But I feel it is not as good for frequent one-to-one contact between a company and their customers. It has serious disadvantages in user friendliness. It is best suited to a hosted cloud service of some kind, where everyone shares the same pool of source addresses and where contact is expected to be infrequent.

Yes this blog post is about that seemingly perennial need to set-up some sort of automatically incrementing version number system for your latest project.

As it happens the “trigger” for me this time around was not a new project but more as a result of our switch from TFS to Mercurial for our source control. It took some time for Mercurial to “bed in”, but it definitely has now. So then you reach that point where you start asking “Okay, so what else can we do with this VCS to improve the way we work?”.

Our first pass at automated versioning was to simply copy the status quo that worked with TFS. This was rather crude at best. Basically we had a MSBuild task that would increment (based on our own strategy) the AssemblyFileVersionAttribute contained inside the GlobalAssemblyInfo.cs. The usual hoo-har really, involving a simple regular expression etc. This was fine. However we did not really like it because it was, effectively, storing versioning information inside of a file held inside the repository. Separation of concerns and all that. It also caused a small amount of workflow overhead involving merging of named branches – with the occasional albeit easily resolvable conflict. Not a major issue, but not ideal either. Of course, not all projects use named branches. But we do; as they’re totally awesome for maintaining many concurrently supported backward releases.

Versioning strategies

The way I see it, there is only a small number of routes you can go down with project versioning:

Some sort of yyyy.mm.dd timestamp strategy.
This is great for “forward only” projects that don’t need to maintain supported backward releases. So for web applications, cloud apps etc – this is, I suspect, quite popular. For other projects, it simply doesn’t make sense. Because if you want to release a minor bug fix for a release from over a year ago it wouldn’t make any sense for the version to suddenly jump to the current date. How would you differentiate between major product releases?

The typical major.minor.build.revision strategy.
The Major and Minor would be “fixed” for each production branch in the repository. And then you’d increment the build and/or revision parts based on some additional strategy.

A DVCS-only strategy where you use the global changeset hash.
Unfortunately this is of limited use today on .NET projects because both the AssemblyVersionAttribute and AssemblyFileVersionAttribute won’t accept neither a string nor a byte array. Of course there is nothing stopping you coming up with your own Attribute (we called ours DvcsIdentificationAttribute) and including/updating that in your GlobalAssemblyInfo.cs (or equivilent) whenever you run a build. But it is of zero use to the .NET framework itself.

Some sort of hybrid between #1 and #2 (and possibly even #3!).
This is what we do. We use a major.minor.yymm.revision strategy, to be precise.

We like our #4 hybrid strategy because it brings us the following useful characteristics:

It has “fixed” Major and Minor parts. Great for projects with multiple concurrent versions.

It contains a cursory Year and Month that can be read from a glance. When chasing down a problem on a customer environment it is simple things like this that can speed up diagnosis times.

An incremental Revision part that ensures each build in the same month has a unique index.

So then, how did we implement this strategy on the Microsoft stack and with Mercurial?

Implementation

The key to implementing this strategy is first and foremost with retrieving from the repository the “most recent” tag for the current branch. Originally I had big plans here to go write some .NET library to walk the Mercurial revlog file structure. It would have been a cool project to learn some of the nitty gritty details of how Mercurial works under the hood. Unfortunately, I soon discovered that Mercurial has a template command available that already does what I need. It’s called the “latesttag” template. It’s really simple to use as well, for example:

$ hg parents --template {latesttag}
> v5.4.1105.5-build

There is also a related and potentially useful template called “latesttagdistance“. This will count, as it walks the revlog tree, the number of changesets that it walks past in search for the latest tag. It is possible that you could use this value as the incrementation extent for the Revision part in the strategy.

At this point most of the Mercurial fan base will go off and write a Bash script or Python script to do the job. Unfortunately in .NET land it’s not quite that simple, as we all know. I could have written a, erm, “posh” Powershell script to do it, for sure. But then I have to wire that in to the MSBuild script – which I suspect would be a bit difficult and have all sort of gotchas involved.

So I wrote a couple MSBuild tasks to do it, with the interesting one aptly named as MercurialAssemblyFileVersionUpdate:

You’ll notice that the parsing implementation is quite forgiving. It is a regular expression that will extract anything that looks like a parsable System.Version string. This is cool because it means the tags themselves don’t have to be exactly pure as System.Version would need them. You can leave a “v” in front, or add a suffix like “-build”. Whatever your convention is, it just makes things a bit more convenient.

Remaining notes

The implementation above will generate global tags that are revision controlled in the repository. I did consider using “local tags” as those wouldn’t need to be held in the repository at all and could just sit on the automated build server only. However unfortunately the “latesttag” template does not work with local tags, it only appears to work with global tags. It would also of course mean that developers wouldn’t benefit from the tags at all, which would be a shame.

Mercurial stores global tags inside a .hgtags file in the root working directory. The file is revision controlled but only for auditing purposes. It does not matter which version you have in your working directory at any given time.

You may still require a workflow to perform merges between your branches after running a build, in order to propagate the .hgtags file immediately (and not leave it to someone else!). If any merge conflicts arise you should take the changes from both sides as the .hgtags file can be considered “cumulative”.