I've probably written this post a bunch of times over the years, some versions only in my head, others languish in my drafts folder.

This post is about Microsoft, specifically the painful lessons of their legacy and their marching into recreating a whole new set of painful legacy problems for the next generation.

Windows - not so smooth operator

I'm not really sure who is running the OS Division these days: Engineers or the Sales and Marketing departments. They appear to have been at war with their users for some time. Perhaps it's always been that way and I just didn't notice.

From the days of IE anti-trust lawsuits or of Bing's attempts to counter Google search, Cortana as a response to Siri, the envy of seeing other App Stores ecosystems thrive while MS Phone store withered and died (I loved my Win phones too).

The rise of tablets and mobile computing, which recently saw Android become the most used operating system, has seen Windows make ever more user-hostile attempts to maintain its market share and relevance.

The dirty tricks employed during the 'Upgrade to Windows 10' SaaS campaign defied belief at times, so hostile were they to those who simply didn't want it following the version 8 and 8.1 disasters.

The 'hints' that chrome is slow and "wouldn't you like edge instead" are a throwback to the IE-bundling behaviour that I think isn't really working out.

The latest round in the OS-war on users is the introduction of adverts in the disappearing/reappearing start menu. I cannot imagine this is what users actually want.

For the time being, however, windows is still by far the dominant desktop OS, but I do wonder if the Windows 10 stats would be quite as high if they hadn't tried every trick in the book to force users to upgrade.

I get it, though. I'm not their target demographic, a developer, a power user with no interest in an closed-ecosystem app store and no desire to use even a fraction of the 'features' like Cortana, Bing, Edge etc.

The problem with this thinking though, is that there are the millions of people like me who help their families with their computers, who advise businesses on future technology and who develop the ecosystem of non-MS produced applications that help make software ecosystems successful.

The admittedly hilarious Ballmer chants of "Developers, Developers, Developers" ring somewhat hollow these days when it comes to the OS side of things. I'm sure their dominance on desktop still makes developing UWA's for the Windows 10 store financially appealing, but with cloud and mobile apps now dominating overall, there is more incentive than ever to spend your development dollars elsewhere.

Full disclosure here, if it wasn't already obvious: Having used windows since it came on 5 multi-coloured 5 1/2 inch floppy disks, last year, I have mostly ditched it for Linux Mint as my daily OS and I really don't miss it.

The only thing still keeping me on Windows is Visual Studio and my work (which is on Windows 7 with no plans to upgrade at present).

Windows just seems like a hostile environment these days because I choose not to use their 'features'.

Browsers & the law of unintended consequences

I remember that Internet Explorer as far back as version 3 produced IE-only features baked-in to the browser. Further, the browsers have consistently been baked-in to the operating systems.

Given the life spans of XP, Windows 7 and IE versions 6-9 with its platform dominance - doubly so in corporate environments - there were, and still are, a huge number of applications written as 'optimised' for IE (an oxymoron if ever there was one).

As a developer who worked on and off with SharePoint and CRM for many years, each new IE browser often crippled these applications and would start the 'compatibility-mode' bum-fight which is still continues to this day.

Rewriting any software that extended these systems every three years was a game that got really old, really quickly.

Worse still, are the silent mass of desktop applications that use the 'embedded browser' control with embedded content that is 'IE optimised'.

Further compounding these problems, the compatibility-mode renderer wasn't even consistent between later versions of IE.

The decision to bake in browsers to specific operating systems is arguably one of the worst, most backwards-compatibility-hostile acts of enforced obsolescence in the history of computing.

Consider how much time, effort and money must have gone into rewriting broken browser-based applications. It must surely be measured in billions of dollars.

Chrome, far and away the most popular browser on the planet, as well as Firefox, Opera all have cross platform versions. There is no reason for a browser to be baked-in to an operating system.

And now with Edge, only available on Windows 10, it's more of the same. Edge, at this time, feels more like an alpha product to me, some way behind all the major browsers.

Both IE and Edge still don't support Server Sent Events (SSE) - perhaps because they have an abstraction for the transport selection in SignalR - but it's a notable omission which has been under review for years now, along with a number of other features already supported by all the other major browsers.

CSS Grids recently provided an example of the difference between the major browser vendors' approaches.

Edge nor IE was among them.

One of the most interesting and exciting features that has the potential to create a tectonic shift in web development is WebAssembly. Again, no Microsoft representation here either.

If you, like me, still to this day, have to cope with the issues surrounding a bunch of IE-optimised (embedded like a blood-sucking tick) applications in your environments, perhaps you understand where I'm coming from.

These compatibility issues cast an extremely, EXTREMELY long shadow.

Microsoft themselves must have viewed the IE situation as unrecoverable from a reputational point of view to have effectively tomb-stoned IE in favour of Edge on Windows 10. Yet they chose exactly the same approach of baking-in Edge to the OS.

They really haven't learned from their past here at all.

Appetite as a measure of success

When .Net first came out, there was a quick uptake. It was a clear improvement over classic ASP. The framework had plenty of pain-points but it shipped alongside tooling that let people write production-ready apps from day one.

When .Net MVC came out, there was a quick uptake and a viable path to launching production-ready apps. Again, it was a great improvement over WebForms by decoupling the markup from business logic. Improved testing, much better extensibility and streamlined execution pipeline were the motivators to switch.

However, again, there were pain-points for those creating large-scale apps baked into the MVC design.

Other .Net MVC frameworks arose in this time as a reaction to those pain points such as Nancy, FubuMVC and ServiceStack.

Many developers also left .Net altogether as alternatives like NodeJS or Ruby on Rails came out. Many '.Net is dead' 'Why I'm rage quitting MS' posts were written.

Let's be clear: this post may be one big gripe directed at MS, but I'm not quitting .Net. I think it has a bright future, in fact, more so than at any other time since it first came out.

MS tried to reinvent JQuery with their own JS DOM abstractions but after a number of years gave up and threw it under a bus. Same with their JSON deserializer. They introduced the service locator pattern!.

There is nothing wrong with reinvention as an attempt to compete and improve. May the best one win....but using the platform, defaults and tooling to create lock-in generates chronic long-term pain and a ton of legacy apps out there that will struggle to be refactored around the baked-in designs which require a more scorched-earth approach to update.

There is a larger issue here with the introduction of .Net Core, however.

How a cloud fought an emperor and x-plat was reborn

Azure runs on Linux, in fact most, if not all, the major cloud platforms run on Linux. SQL Server is going to run on Linux. Visual Studio for Mac shows a non-windows IDE is possible. So I expect sooner or later that it too will run on Linux (especially if Project Rider proves popular). Windows resurrected its POSIX/Linux subsystem and, again, it wouldn't surprise me if the future of windows is as a Linux distro.

So it's not that surprising that .Net Core was announced to much excitement and enthusiasm. I shared that enthusiasm and still do. It opens up choice and some genuine competition. People can choose the OS and development stack they want knowing that there won't be feature-disparity or lag between platforms. Creating cloud apps for Linux reduces licencing and hosting costs.

At NDC London last year, Scott Hanselman quipped that Microsoft didn't really care which platform you ran on, they just wanted your for loops on Azure.

It goes further than that, as you're not tied into Azure either. You can choose any cloud provider and that's really great for driving healthy competition.

Version 1.0 and 1.1 are out and 2.0 is on the way. It's a grand effort to re-imagine .Net cross-platform without the split effort of mono as well as fixing some mistakes of the past. I love that!

So, now, we are a year or two into .Net core but the landscape is very different from the original ASP.Net and MVC introductions.

They aren't releasing something new, so much as re-creating something that already exists, but cross-platform and without the framework installer dependency. From this huge undertaking, understandably, there's a lot of 'stuff' missing from the initial releases.

However, rather than focus on this missing stuff, the debates on GitHub revolve around reinventing established frameworks (including MVC), libraries, concepts and of 'baking-in' anaemic versions of popular full-fat .Net libraries, rather than focusing on feature parity, tooling and migration.

Their OSS efforts here are definitely an improvement. When you see the sausage factory in action though, it often isn't pretty.

There is a fundamental conflict of interest here that few other development communities have to deal with.

One of my main gripes here is not that they changed their minds numerous times throughout the process - that's entirely fair I think - but that they split the platform and tooling efforts.

In their honest desire to join the club of increased release-cadence, the split in tooling support has crippled many developers' efforts to port early versions of their established projects to .Net Core.

Even with two versions out and the third imminent, there are remarkably few libraries shipping with multi-targeting for .Net Core and I suspect even fewer production apps out there in the wild running on Core.

There simply isn't nearly the same uptake that happened for .Net and .Net MVC either.

Their packaging, tooling and versioning schemes have been confusing and disjointed. Their seeming willingness to throw established well-maintained OSS projects under the bus by baking-in the lowest common denominator abstractions is confusing as well.

The effect, even if they end up dominating the space, is that 'baked-in' is always expensive and painful for applications to 'bake-out'.

It's a preference for frameworks over libraries that creeps in over time. A lack of using simple dependency-free components that when chained together become greater than the sum of their parts, i.e. the Unix philosophy in action.

Breaking compatibility with both tooling and frameworks ends up isolating and atrophying applications that become enslaved for those who can't expend the resources on yet another rewrite/upgrade.

It is the kind of scenario in which many experienced developers, so valuable to the ecosystem, just give up altogether for alternative languages and platforms.

It's hard for me not to view all this as wasted energy, of building monoliths, while they, along with the industry, preaches on the value of reducing complexity through building components with clearly-defined boundaries.

By not focusing on educating developers and decoupling their own products, aren't they just storing up the next round of future backwards-compatibility headaches and perpetuating the problem all over again?

The root cause, is a lack of educating, teaching, guiding.

You don't teach kids Maths by giving them a calculator. You first show them how to do it themselves. THEN and only then do you give them the shortcut tool.

Should it really be Microsoft's job to teach, however?

They are, in the end, a business, there to sell products and make money. They are not a charity or educational institution.

Their desire to give developers ready-made solutions to provide the quickest possible productivity-gains makes sense from an IDE perspective; but when this bleeds into the frameworks and tooling, I think it just perpetuates the industry-wide epidemic of failed software projects and quite frankly industrial-scale mountains of crappy designs and code.

My view is that in the long-term this approach does much more harm than good.

Whether they view it as their responsibility to improve the overall state of the industry or not, the effect of providing OOTB designs that developers don't properly understand and which are poorly implemented nor explained, often results in predictably poor applications.

Over time, the cumulative effect is that it's not the application developers who suffer but the language, platforms and tooling that get a bad reputation as being responsible for producing poor software.

So they ocean-boil and re-brand, now it's new, shiny and all better, you just have to rewrite all your apps again.

It's the law of unintended consequences again, and this thinking displays a lack a reverence for learning from past mistakes and looking at the underlying motivations for these repeated patterns of re-invention.

I know, it's easy to criticise the big guy, but I've read so many threads, thanks to the openness of the process on GitHub, where MS have been warned repeatedly about the choices they are making by well respected and experienced developers. In too many instances IMO, they have paid no heed.

Mac attack!

Scott Hanselman recently called me out on some hyperbole in a reply on GitHub.

He was absolutely right, of course. I have nothing but respect for Scott, always a voice of reason, a nice guy, a gentleman, and I did make a silly comparison in my first sentence of my response.

What is a shame, though, is that my tongue-in-cheek hyperbole undermined the rest of my point. He really didn't engage with what I was trying to get across about 'reinventing' and efforts towards tooling.

I've lost count of how many times I've seen tweets like this one in my timeline just today.

I'm sure both Microsoft and other .Net developers will disagree with me on this, but I think they ARE wasting time re-inventing things; there HAS been a chronic lack of focus on the tooling and there IS more than an aroma of ocean-boiling going on in a few areas.

It's one thing to focus on shipping features in MVC, Razor, SignalR etc when you have the core stuff down.

No, my poorly-worded off-topic point, was that only when you have the basics right with .Net Core's tooling, should the focus then be on delivering the frameworks and framework-tooling to build on top of it.

Creating shims over yeoman, MSBuild, Nuget, test runners etc is not advancing the core areas, and it just looks like they are trying to copy the CLI's of others out of a fear of being left behind somehow. It brings nothing new to the table.

They really haven't got that part right at all.

MSBuild

Looking through the now open-sourced MSBuild code is like stepping into a decade or two of cruft and backward-compatibility bolt-ons.

Now, that isn't unusual, a lot of critical long-lived projects end up like this but I would far prefer that more of their considerable resources were focused on tidying this project up and improving it for the next generation of application development. There is so much more they could be doing here.

The MSBuild sourcecode is laden with feature-based pre-processor directives to cleave off platform-specific code. This IMO is code-smell that should have been be refactored and logically grouped into feature-specific abstractions and implementations.

It reeks of bolting on x-plat support in a hurry.

The logging and error handling is sprawling and hard to follow. The collections, immutability, async, scheduling and queuing are bespoke implementations that look like they haven't been updated using the advances made in those areas in the .Net framework. Maybe that's just my failure to grok the code though.

My biggest bugbear by far, however, is the project system.

Laden with Visual Studio and Windows-specific hooks for SDKs and options for things installed with Visual Studio, this is the cause of so much developer pain.

I personally have spent far too many hours of my life working around this stuff.

The build system is the bedrock of any stack, it is the core of organising, compiling, tooling and continuous integration. So to have heard so many times over the years the same old discussions around "install the IDE on a build server" for solving CI issues should be an obvious red flag for discomfort, like an infant-behind-you-screaming-continuously-on-a-12-hour-flight obvious.

Out of this pain arose an army of VS plugins, MSBuild shims and, in more recent times with the arrival of Nuget, packages that counteract this M.A.D. coupling between IDE and Build system.

Why should it be left to the VS EnvDTE to be able to manipulate a project, adding files or folder or changing configuration settings? It's awful stuff to work with.

At this time, there still isn't a set of MSBuild nuget packages that you can reference to load, parse and manipulate a project prior to compilation sans-Visual Studio.

MSBuild is being worked on and their roadmap shows that they are working towards a dependency-free x-plat nuget package, but there is no timeframe for this.

Given that we are nearly 3 versions in for .Net Core, it clearly hasn't been given the same prominence or resources as the 'shiny stuff'.

Microsoft won't back-port either the tooling for .Net Core projects to VS 2015. This, again, is likely because of coupling.

Yet, in the Roslyn and typescript compilers, they showed clearly that they understand why this separation is important and useful.

You can watch Ander's discuss the how and why in this great video on compiler construction and you might get a sense of why I think MSBuild is sorely lacking a VS-decoupled project as a service API.

Roslyn

Roslyn was a huge undertaking by Microsoft that genuinely did something unique in the compiler space no one had done before.

All compilers (that I know of...) before were black boxes where source code and settings went in and artifacts came out.

Every implementation of intellisense had to effectively recreate the lexical parser of a compiler, and often the semantic model too if you wanted things like type-checking or flow-analysis in your IDE.

Roslyn's remarkable abilities exposed these things as consumable services which allowed any IDE to leverage them. One implementation, provided directly by the compiler.

Anders discusses in that video above how the same approach drove them when they were building the Typescript compiler.

With Roslyn, the compiler black box is no more, it's a rich pipeline that when you tap into, can be leveraged in amazing and interesting ways.

One huge disappointment with Roslyn is that they stopped short when it came to adding compile-time hooks for sourcecode rewriting and generation to enable all sorts of interesting AOP-style projects to be born.

Roslyn has been RTM'd nearly two years now and the uptake for design-time code analyzers and fixes has been healthy, but there still isn't a good story around compile-time hooks.

It is a hugely difficult problem when you dig into it, but, it is the type of feature that can only really be delivered by the language teams, and not, as I have frequently read in response to requests for it, "we are waiting to see what the community comes up with".

IMO it should be prioritised over every other language feature for one reason.

It has the potential to create an entire ecosystem of libraries, tooling and innovation around it. Advances like pattern-matching and record types whilst important, do not.

The best option for compile-time code generation currently available for Roslyn seems to be Andrew Arnotts CodeGeneration.Roslyn. It's a great start at wiring up the build hooks for doing attribute targeted generation but it is still only capable of nibbling at the edges of this feature.

MVC vNext briefly created the ICompileModule with a weird folder/file hook convention for compiling MVC Razor views, a similar but again limited approach to source generation, but this was scrapped in favour of source generators.
A feature, as yet unreleased, that again looks somewhat stilted from the docs and with a demo project that references non-existent packages (publicly at least).

The opportunities in this space are immense, like eliminating lots of expensive run-time reflection or expression-heavy code.

The next object mapper could just generate left right code.
DI containers could achieve invocation performance parity with new Foo() without compiled expression tree trickery.
The next Postsharp or Fody AOP framework could just re-write the code instead of expensive ILWeaving.

It's disappointing that this hugely difficult but interesting problem hasn't been more of a priority so far.

S'all bout the bootstrapping y'all

Microsoft have talked a lot about wanting to streamline the process of on-boarding with .Net Core using CLIs, templates, etc.

Yet when fundamentals like the build and tooling dependencies are not explicit and instead powered by magic or coupled to an IDE, you end up with projects you cannot just clone, build and run. There ends up being steps to install this, configure that.

So when a new IDE version comes out and you're building on a cloud platform that doesn't yet have part of that IDE installed on the virtual machine images, well sorry, you're out of luck.

Now, they have made progress of sorts in this area. On the 3rd March they announced Visual Studio Build Tools as part of the VS installer. Its back to the installable-SDK approach for CI environments but it's still coupled to the VS installer for some reason. No other build tool does this kind of stuff. It's an archive file or a single binary usually.

There also recently released VSWhere but, again, is just a Band-Aid on the issue of not being explicit about the build tooling and resorting to 'find dependencies by magic'

Going further I'd like to see the IDEs consume an MSBuild project service, much like intellisense and analysis with the Typescript and Roslyn compilers.

Part of the reason for this coupling between IDE and building, I think, stems historically from TFS. A build server with all the same shared guts as Visual Studio and shipped together. It was easy for Microsoft to just recommend using both, and all those coupling issues were solved.

I'm old enough to remember when Microsoft charged enormous fees for TFS. In the face of rising competition from CI servers like TeamCity and Jenkins they basically started giving it away.

Now with the rise of cloud and online VCS providers like GitHub and BitBucket, services like AppVeyor are once again transforming the CI server landscape and TFS (or Visual Studio online) are further away than ever from dominating in a more competitive ecosystem.

Maybe this dual VS/TFS coupling was done to block competing IDEs and CI servers; perhaps it largely worked in a windows-centric, MS-only corporate environments before the cloud was even a spec on the horizon.

With the rise of cloud CI services and the imminent arrival of the first credible Visual Studio contender in JetBrain's Project Rider, there has never been a more compelling reason for Microsoft to ditch these old coupling tricks.

One only has to look at the foundations of Rider's IDE to understand that when you don't have a platform or ecosystem that locks in developers for visibility and market share, you cannot provide a substandard tightly-coupled product and still hope to succeed.

Jetbrains built their own lexical parsers for a wide range of languages that the ReSharper core uses, along with a unified plugin system that is powered by the same x-plat IDE.

This isn't an accident. It's a good business strategy that doesn't try to take customers hostage to up-sell them TeamCity or YouTrack.

Each has to live and die on its own merits.

The road to hell is paved with good intentions

Now I've been bashing Microsoft here a lot, but I'm not one who subscribes to the "Microsoft is evil" camp. I don't believe they are malicious or hell-bent on destruction or anything like that.

There are plenty of really great, hard-working, honest and sincere people working there who only want to do good.

They're a big corp, lumbering...yes, sometimes overbearing, misguided plenty and often confused as a result, I think any organisation that size suffers from the same syndromes.

For developers, since the introduction of .Net and with Scott Guthrie at the helm, things definitely improved a lot in a lot of areas over the years.

Creating the closed ecosystem is old Microsoft thinking.

I'm just not sure that new Microsoft thinking is any different when it comes to its tooling...

There are a number of tried-and-tested methods for discovery to be found in DHCP, Bonjour, uPnP, SSDL and DNS-SD. For web-based services, UDDI and WS-Discovery have come and - for the most part - gone.

Gateways like NGINX also provide routing options which can be used for decoupling service-to-service calls.

Enterprise Service Bus systems like NServiceBus and MassTransit also can be used in a pub/sub messaging pattern to decouple service-to-service calls.

I've mentioned just a few but there are many more. You have a lot of options here, so how do you choose?

Let's first briefly cover some different patterns before I cover what we have chosen to use and why.

Centralised Registry vs. Self-Discovery

There are two common patterns that you find in solutions for Service Discovery.

The first is the service registry, a centralised database that stores the location of a service.

The second is self or auto-discovery where there is no central database and is often found in zero-configuration networking. Instead, clients use a variety of approaches to broadcast packets across a network to request a remote service and wait for the required service to respond with its location.

The service registry is another single point of failure (SPF) in your infrastructure but can provide more operational control. When used with server-side discovery, which is often found in gateways, it can completely decouple any discovery logic from the services.

Zero-configuration networking can be generous on security within networks to permit devices to 'just work' but can be more challenging to secure as systems span networks. It is often more suitable for smaller networks (uPnP, Bonjour etc.).

Communication

There are four common types of service-to-service communication.

Point-to-point : services talk directly to each other.

Gateway : acts as the middleman, handling the routing of requests and responses between services.

Gateway Request : the responding service replies directly to the calling service rather than return through the gateway.

Message Queue: services publish messages to a queue, the responding service subscribes to the messages published and in turn publishes its response to the queue for the original service to subscribe to.

Point-to-point involves the shortest route so is often the quickest but requires each end-point to take a dependency on your discovery mechanism.

The gateway can decouple many concerns from your services, handling not just routing, but caching, front-end to back-end bridging with HTTPS termination, transport conversions like HTTP to TCP/IP, formats, aggregation and load-balancing, to name just a few.

The message-queue pub/sub model is slower and is more suited for longer running processes.

Registration

For service registries only, the registration can be handled by each client directly or by the server.

Further reading

I've only really scratched the surface on the above keeping the explanations as brief as possible, as I want to get on to some specifics, but you can find a much better, more detailed overview of Service Discovery in Chris Richardson's excellent post as part of his series on Microservices.

Chris also has many video talks and articles available online and speaks very eloquently on all matters relating to distributed design which I have greatly enjoyed during my own research. I highly recommend checking them out.

So this is the first critical point where we had a variety of choices to make in our design.

Do we want smart versus dumb pipes? How about decentralised control with auto-discovery? How does our communication behave? Who controls registration? Is one single approach for all scenarios even practical?

For us they are opinionated and deliberate choices.
Our approach that follows is not inherently better or worse, but each choice has consequences for many of the subsequent design decisions. In many cases, they can actually remove choice.

We will come back to reference these choices in the rest of this series.

It is also worth pointing out that I couldn't try out everything available, so our choice is not a reflection on other solutions out there, it is just the one I felt best fit ServiceStack and suited our needs.

Consul, like all service registry patterns is a potential SPF, but is designed for High Availability in mind.

In production, you run an odd number of Server nodes which form a DataCenter (DC), typically three or five. You can scale Consul to connect multiple datacenters.

The odd number is because it implements a consensus protocol based on RAFT which holds leadership elections, and they need a deciding vote to elect a leader.

For the best possible resiliency, server nodes can be spread across physical hardware, network locations and operating systems. Running three instances allows a single node to fail while running five can tolerate two node failures.

Consul is actually a hybrid model of server and client-side, something also found in Netflix's Eureka. This approach avoids one typical drawback of client-side discovery and self-registration systems i.e. network availability and latency.

It avoids this by using local agents on a loopback address.

Each service has access to an agent co-located on the same physical hardware. Consul uses a gossip protocol Serf for managing membership, failure detection and message broadcasting and RAFT logs to keep each agent's list of services synchronised.

This means lookups and registrations are local and fast with no network hops.

The approach also helps decouple the HTTP verb specifics of any external calls from your call site and instead makes the DTO responsible for defining how it is sent.

But wait, there's more...

In addition, Consul provides another piece of the infrastructure jigsaw which our plugin handles for you - service health which we will cover in our next topic.

The gateway will also select the correct format for retrieving the DTO. If your remote service only communicates in XML, it will transparently call it using XML but return you a POCO.

It will also automatically cache responses from a GET request according to the remote service's cache settings. In some cases, it will not even issue an RPC, instead returning you the DTO response straight from the cache.

Instead of REST and all the great custom and fallback routing options in ServiceStack, we have chosen to use only ServiceStack's pre-defined-routes.

Together with our second consequence of globally unique DTOs, this allows the RPC routing to just work with Consul.

So let me try and explain why we've not only ignored RESTful routing, but will actively seek to prevent it being used directly in our Services.

There are a few reasons behind this but first it might help to clarify that we plan to use services internally at first, but later on expose them externally using a Gateway to be built on top of Consul.

Internally, with ServiceStack's ServiceClient and the DTOs, you already have fully end-to-end typed API calls so never really need to see a URI, let alone care what they are, this isn't so bad for them.

We expect that most of the internal calls will use this typed approach.

You can use custom routes, and the service-to-service calls will even use them. This is not really the problem area though.

Any non-ServiceStack client that wants to consume the services would have to go via Consul to find the right service, and Consul doesn't know a thing about your custom routes.

This affects the few internal apps or services that do not use the ServiceStack client and probably the MOST important group, the external clients.

Contract stability is of paramount importance, but addendum's to contracts are OK.

So clumsily put, if we ensure our DTOs are backward-compatible, we have far more stability in our contracts. Contracts that can tolerate change. Contracts that instil confidence and the trust of consumers.

Another reason for avoiding custom routing in ServiceStack is the complexity of making it work correctly.

In what order do I add this service's routes to the routing table?

Will a fall-back or over-generous catchall route suddenly grab all other services requests?

Will the new dev/team remember to respect the guidelines?

As I mentioned previously, adding an external gateway is part of our future plans and we expect it to handle things like load-balancing, traffic shaping and SSL termination, all in one place, rather than in each service.

If in that future, we must have RESTful routing, it will be as a decoupled, globally managed concern in that gateway, carefully managing the mapping of routes to services. Even this though, by its nature, is static and prone to 'churn' in such a dynamic environment. (see schema changes in ORMs)

We are currently looking at a few options for Gateways so I'll simply mention one that stands out so far, Fabio

It looks to have great integration with Consul and avoids the need for more complex Consul-template solutions. Another one for the roadmap.

If you have multiple instances of a service available to process a DTO, Our plugin will sort these by the agent RTT, giving you the most responsive.

This isn't really load-balancing, more QoS, but it is useful nonetheless and worth mentioning.

Another thing Consul gives us is in how it maintains separate service catalogs per datacenter. Using this ability, we could locate datacenters and their services in different geographic regions to even out global traffic loads.

For true load-balancing though, we have to look for other solutions and they lie outside of each service.

A gateway is the most obvious candidate for this and Fabio allows you to split traffic between services based on rules, useful for things like canary deployments as well as more traditional load-balancing.

In the world of microservices however, we actually have all the ingredients we need to make something ourselves if we need to.

Having a service registry in Consul with RTT, Health and performance metrics information from logging for every service end-point opens up interesting possibilities for using that data. Combined with a good automated deployment pipeline, there are possibilities for elastic scaling. I'll explore this in more detail in the deployment topic.

In a monolith world with all the great modern tooling available to you, it is easy to load a project up in your IDE, set some breakpoints, and hit 'Debug'.

You can step through your code line-by-line and inspect its state.

Or you can write integration tests to assess the state of a system and to verify that the components of the system are interacting correctly with each other.

The single-threaded process has rock-solid reliability and low-latency for inter-process communications.

If you call a method on a class, you aren't concerned if it will reach that method-call or if will it take an unacceptable amount of time to get there.

Even at the boundaries of a process, things are often binary. If the database that runs the application or a file resource is unavailable, the application will crash or display an error.

You can check application logs, the system eventlog, the coredump or WinDbg type listeners to help you find, reproduce and fix the problem.

It's all in one place.

Next comes the multi-threaded process, to which nearly all of the above also applies.

How many of you have experienced race-conditions in one of your multi-threaded applications?

I know I have.

Even with the great tooling available, they are often subtle and hard to pin down. Introduce state mutation into the equation and you now need to worry about concurrency. There are locks and semaphores to deal with.

Is that class I used even 'thread-safe'?

Time and the order of execution between threads is no longer reliable, and the side-effects of this increase the level of complexity, of both your code and of your ability to reason about run-time state.

But they still have reliable, fast inter-process communication.

Now let's introduce asynchronous processing. Single-threaded and multi-threaded applications can be synchronous or asynchronous.

Now, within even a single thread, the order of execution is no longer reliable.

There is an excellent overview on the differences, but the take-away is that introducing multi-threading or asynchronous programming to your application can significantly increase the difficulty of reasoning about your system state as well as finding, reproducing and fixing bugs.

Yet they still have reliable, fast inter-process communication.

Now enter the distributed system, you probably see where this is going, don't you?

You are now in the world of remote procedure calls (RPC) across processes or networks. Each remote call is an order of magnitude slower and therefore, so is your application performance.

In our case, with ServiceStack, the RPCs use a request/response message-passing style.

Just as before, RPCs can be done either synchronously or asynchronously, but due to the performance of remote calls, using asynchronous processing is not really a choice, it's practically a must-have..

Now we have to ask ourselves - will calls be inexplicably duplicated? Will they even be invoked at all?

We have just lost cabin pressure. Communication is no longer reliable.

Down the rabbit hole we go, this stuff is hard and it gets weird, really fast.

Debugging this kind of system, which is executed piecemeal across processes and networks, just cannot be done with an IDE and access to a machine or applications logs.

Trying to reconstruct it in a test environment is an exercise in futility.

ServiceStack is fortunate to have some great people in its community and the plugin was quickly improved by a fellow member, thank you Richard.

Seq is an installable self-hosted service, with an HTTP API that is designed for log aggregation.

There is a readme available on the project, which I won't duplicate here, that covers setting up the plugin and using it in more detail, but I'll cover the basic code required to use it in your ServiceStack AppHost.

See everything!

The plugin now captures every incoming request to ServiceStack in Seq and is capable of logging every detail, not just the path but the headers, the request and response DTOs, execution timing, service exceptions and errors.

In addition, the logging detail can be modified at runtime, so when you need to debug in production, you can ramp up the level of detail logged. I'll come back to this again a few times in later posts.

So the first thing to note is that you are not storing plain text, you are storing structured data and it makes all the difference in the world.

Having used Seq for a while now in other applications, I know how quickly it can help you identify and fix issues in your production systems. It's very easy to use and it comes with a free single-user licence. I highly recommend you try it out for yourself.

Tradeoffs:

Unless your logging receiver has high-availability (which Seq at this time does not have), we have just created our first piece of critical infrastructure as a single point of failure (SPF).

When any SPF goes down, bad things can happen, so throughout the series, I'll point these out.

Seq does however have a forwarder. This works against a local loopback address to buffer requests for forwarding onto your log server.

This helps with network unreliability by eliminating remote calls and the performance penalties that are associated with them.

An alternative is to use a UDP broadcast style of logging, like statsd. It may serve your circumstances better.

Our logging uses an http async 'fire-and-forget' style, so the network performance cost is reduced, but, if your service, network or Seq fails and you do not persist ServiceStack logging locally to disk or use a forwarder, you lose potentially valuable log data.

We could improve this plugin in future (PR's welcome!) to include a more resilient, guaranteed delivery to survive network outages; but for now we are not too concerned about this.

Our rationale will become more apparent as our design is revealed.

Debugging

The second part, distributed debugging, is a much trickier problem.

In distributed systems where there are many moving parts involved, the ability to reconstruct a timeline of events and state-mutations across your services is essential to being able to effectively debug and reason about system's state and overall health.

Enter the correlation identifier, our next plugin on the road to microservices

Again this captures every incoming ServiceStack request and adds an Id to the request header. The service gateway's internal and external calls will pass this identifier on so that you can identify service to service calls from their point of origin in your logs.

It is very early days for this plugin though. We need to refine it to be able to reconstruct a full map of service-to-service calls at each node, but for now, that is on our future road-map.

Being able to map out the calls is important for a couple of reasons which are worth mentioning at this stage though.

re'Curse' of the infinite loop

Having a good timeout policy on all remote calls can help with this, but being able to add self-referencing checks in the correlation plugin can help with this by cancelling such requests.

Am I already in the call-map of the thing calling me?

Yes, byeee

the > never > ending > chain > of > calls > . > . >

Making remote calls easy and transparent to your services is really powerful, but it is also just as easy to abuse that power.

With each network hop, you increase the likely-hood of timeouts and the responsiveness of your API and their consumers suffers.

Having set limits on the length of call-chains can force distributed teams to be judicious in their use of dependencies (see left-pad!) and help foster collaboration between them instead to scale horizontally and keep the stack thin.

Further reading

There is some great research material if you are interested in reading more on this topic.

Google has a whitepaper on Dapper, a large-scale distributed systems tracing infrastructure, and there are a few implementations out in the wild to be found.

Another task for the future, is to capture requests at a lower level, like a proxy, so that tracing can work beyond ServiceStack calls to include data stores, files and external infrastructure resources.

Finally, there are also very interesting possibilities emerging from Joyent for run-time inspection and tracing using dTrace and containers worth keeping an eye on too.

That concludes part II, if I've missed anything, or you have your own great ideas or projects, let me know in the comments.

Also, we'd love others in the community to get involved with the plugins in Github so don't be shy.