Having based previous solution designs on SOA what will Architects need to do differently when adopting the Microservices approach? I thought I would look how the definitions of the two approaches match up. The first problem was there are plenty of definitions of each, so I have chosen to the definitions from a couple of the most well regarded publications: SOA Design Patterns (Erl 2008) and Building Microservices (Newman 2015). The following diagram illustrates how they align:

I looked at each of the Microservice principles to see if I could trace them back to a characteristic of SOA

Model Around Business Concepts is a straightforward match to the SOA characteristic of being Business Driven.

Independently Deployable services are a practical requirement for having an Enterprise Centric services architecture. If services were not independently deployable then service consumers would soon become frustrated at the need to co-ordinate changes. This is also true for services which depend on one another, the need to coordinate changes soon brings back the problems of monolithic designs, for example the scope of testing.

Hide Internal Implementation Details is not explicitly stated in Erl’s SOA characteristics but is a very central concept in many other definitions. If implementation details are not hidden consumers may be tempted to go directly to source, which defeats the value of having explicit service interfaces and, at its most extreme, would start coupling consumers to a specific vendor implementation, therefore no longer being Vendor Neutral.

Isolate Failure is a key requirement for a Composition Centric architecture because as more independent services are involved it becomes harder to guarantee they will all be available and operating correctly.

That leaves three Microservices principles that don’t, at least in some way, obviously tie back to Erl’s SOA characteristics.

Decentralise All Things, in its architectural meaning, suggests avoiding ESB and Orchestrations that may place too much business logic centrally. The need for such mechanisms is not a requirement of SOA but they are often discussed in SOA books and have, in my opinion, become associated with SOA architectures.

Adopt A Culture Of Automation although a good objective for an organisation wanting to be agile is, I would argue, not an architectural matter.

Again being Highly Observable is more of a feature related to implementation than it is an architectural design. That being said messages flows are highly observable and, whilst messaging is not required to be Service Oriented, this is how most services are consumed.

In conclusion I do think Microservices are a good extension of SOA but that the extension is more about the ecosystem around building and deploying services, rather than the resulting architecture. The most fundamental takeaway, for me, is the argument, or caution, against over-using ESBs or Orchestrations.

The phrase ‘unstructured data’ has been around for some time and is typically applied to text, image and video. The complimentary phrase, ‘structured data’, has become synonymous with relational data. If we think about how much information is contained in some typical sources of data it would be something like this:

Simple tables are where I started my career – most data for an application stored in tables without necessarily normalising. Where there was related data we had to hand-code the joins! Relational data needs no introduction. Graph is interesting as it seems to be a way of making some ‘structure’ from what formerly was thought of as ‘unstructured’ especially when applied to text. The remainder then contain increasing amounts of information – natural language, images and video have far more complexity than relational data but have posed a problem for computers to process. The recent explosion in AI capabilities (thanks to Moore’s law) have started unlocking the values of this harder-to extract data.

I would argue that these phrases are misleading and that ‘unstructured’ sends the wrong message to non-technical colleagues or clients. The Wikipedia entry for unstructured data cites a study concluding around 90% of a corporation’s data is ‘unstructured’. As IT professionals we should be encouraging more desire to exploit this data. I would like a better terms for it, how about ‘Rich Data’? And so what to call ‘structured’ data – I did think ‘Simple’ but that makes it sound it should be cheaper to deal with than is actually is. So, to conclude, how about ‘Structured’ and ‘Rich’ or is there a better term out there?

OK, that’s a bit unfair but the question is are there the correct number of them and are they in the most effective relationships within the organisation’s social network? Having recently been seeking a new role I’ve been asked “tell me about a time there was conflict” a few times and many of the examples that come to mind have involved an outsourced arrangement or two. This made me think what is fundamentally going wrong and can Social Network Analysis help? My first thought is that perhaps the organisations were not well understood in the first place and so, when the outsourcing structure was designed, the wrong number of relationship managers were put in the wrong place. An analysis of the organisation before outsourcing will reveal the structure of, and most importantly between, the proposed ‘retained’ and ‘outsourced’ groups.

Some years ago I was a technical lead for a large migration project (around one billion pieces of data). I’ve previously described the transformation structure and would like to share some further advice: practice, practice, practice! If, like my migration projects, there are a lot of complexities like in-flight direct debits and batch timing issues experience has shown that practice really pays off. What do I mean by practice? Once you have reached a point development of transformations and reconciliations is complete then the whole migration should be run against an accurate target (copies of live systems) and to the intended timing of the live migration (weekends, evening, whatever your choice). Make it as close as you can to the live environment (without actually issuing live transactions, of course… another topic) and I can more-or-less guarantee you will find some issues, but that’s the point: you don’t want any on the actual live run. How many practices will it take? I would suggest two to three but if you are migrating in stages you’ll get better at the practicing (meta practice?) so maybe just one will be enough. Good luck with your migration and simplify that landscape!

Often data is collected as part of a process but is not essential to being able to complete the process. For such data items, unless quality is enforced in some way, it is very likely accuracy will fall below 100%. I while ago I wrote about predicting sales from quotations but what I did not explain was that the data came from two systems, one of which did not enforce adequate validation of a user input field required to join that system’s data to the other system’s data Presumably this did not matter to the business process or analysis required when the system was first put in. Much useful analysis can be performed on data that is not 100% accurate, for example looking at ratios over time however there is always going to be some doubt about the results. My examination of machine learning techniques was possible after I had ‘cleaned’ the data by removing and records that could not be matched between the two systems. My results showed only a small improvement of being able to predict a sale over that of tossing a coin to make the prediction, which did not seem particularly useful in the context being examined. However, in some application a small improvement over a 50/50 guess could be very important (e.g. share trading) and in this case even slightly inaccurate data could be giving misleading results. Because of the potential use of data, unforeseen when it was originally captured, I would advise architects to be less tolerant of poor data quality.

One reason I’m a fan of cloud computing, and in particular Platform as a Service (Paas), is that it restricts the design choices available when creating or integrating solutions. Of course choice is great but it can become a problem when there are too many options and no clear way to choose between them. How many times have you seen a project stuck in analysis paralysis? Enter the Architect…

Firstly, at the Enterprise level (and enterprise could refer to an entire organisation or an organisational division depending on structure and business model), there need to be some clear guidelines and technology selections. Exactly what these are will depend on what’s important to the organisation. Architecture principles translate what’s important to the organisation into guidance that can be applied to technology selection and solution designs. They will have the effect of restricting the number of choices available. Examples of restrictions include:

Data protection (e.g. how does the Patriot act impact ability to use cloud providers)

Technology vendors

Consultancies

Individual technology components like OS, Database, Middleware, Web

Hosting: internal, virtualised, external

Existing solutions/applications that will not be replaced

Upgrade frequency (incremental or wait until support is ending)

I’d like to re-emphasis that, at the enterprise level, principles should only reflect the most important guidance and will lead to some restrictions in the list above. There will also be commercial considerations that bring restrictions, most probably due to enterprise-wide deals with vendors like Microsoft, Oracle and IBM to volume licence a range of products.

If the Enterprise Architecture job has been done well a reasonable number of choices will already have been made but there will usually be some still to make, for example:

Do we make use of new features in the framework, database, server, etc.?

Which library do we use for feature x? This is more of an issue for open source where multiple implementations are available as opposed to frameworks like .NET

We could do this in compliance with the Enterprise Architecture but don’t have time and there is a non-compliant alternative

This is a completely new requirement, where do we start?

I would suggest that if it’s taking a long time to choose between alternatives it is either because:

The Enterprise-level guidance is not clear in this respect, in which case the Enterprise architect needs to be involved in order to clarify and update the existing guidanceor

There isn’t a great deal to choose between the alternatives and there is deliberately no directive from the Enterprise level in this area (the project team is free to choose). My suggestion is to pick one or two, and try to develop a part of the most complex or disputed functionality (a spike test). The spike test needs to be quick, no more than a day or two: if it works stick with it and resist trying every alternative. In my experience it’s better to move a project forward even if some of the choices are later found to be suboptimal.

A while ago I went for to a job interview and was somewhat surprised by the interviewer getting rather over-excited about architects writing code, “I don’t EVER want to see an architect coding” was their view. “Not even to better understand a problem?” I asked, “No, NEVER”. I’m not entirely sure what the issue at that organisation was but it has made me consider what the difference between the architecture and development roles is.

Starting at the beginning (Frederick P. Brooks Jr.’s Mythical Man Month): the Architect is concerned with the Conceptual Integrity of a system/solution, i.e. that it makes sense overall regardless of how each part is implemented. The architect must also be able to suggest a way of implementing anything they specify but be able to accept any other way that meets the objective (otherwise how do you know if you’re being fleeced?).

When Brooks was writing there were far fewer layers of abstraction in IT. Today there are many more: from Conceptual Designs, Functional Specifications, High-level Languages to Machine Code, Microcode and CPU logic gates. Each of these has their own ‘Architect’ who leaves most of the implementation to the specialists at the next lower abstraction layer until you get to the transistors. For my purpose I’m considering the roles of Enterprise, Solution and Data architects who tend to be found in medium to large organisations.

So what differentiates Enterprise, Solution and Data Architects from Developers? In my opinion its ambiguity: the input and output of the architects is ambiguous. The outputs from Developers (code) are definitely not ambiguous. I have bored many mangers by repeatedly explaining that any specification written in prose (English) is going to be ambiguous (so deal with it and stop wasting time); if it were not ambiguous it could be compiled. I’ve not seen any commercially available compilers that take in a word document and output a fully functional system. The input to a developer could be unambiguous if an awful lot of time has been put into the specification but generally some degree of ambiguity remains.

In my view what makes an Architect is the ability to deal with a great deal of ambiguity, both in their inputs and their outputs (not knowing exactly how a feature is to be implemented). Developers also have to deal with ambiguity but at least one side of their work (code) is unambiguous. Should Architects code? Yes, but not production code, developers are the coding experts.

I recently completed an excellent book which examines how to deal with information presented as text. It’s called Taming Text from Manning. The authors do a good job of introducing each topic and explain how a number of open source tools can be applied to the problems each topic presents. I’ve not studied the latter but the former are a great introduction.

I have summarised each topic, below:

It’s hard to get an algorithm to understand text in the way humans can. Language is complex and an area of much academic study. Text is everywhere and contains plenty of potentially useful information.

The first step in dealing with text is to break it down into parts and the most simplistic aim of this step is to extract individual words, however there are a number of approaches and more sophisticated ones will need to handle punctuation. The process of splitting text down is called tokenisation. Individual works may often then be put through a stemming algorithm in order to be able to equate pluralised and different tenses of the same stem. A stem might be a recognisable word but not necessarily.

In order to search content it must first be indexed which will require tokenisation and stemming and maybe also stop-word removal and synonym expansion. It is also useful, for subsequent ranking, to use an index that allows the distance between words found from the search phrase to be calculated for each document searched. There are a number of algorithmic approaches for ranking results the simplest of which are based on the vector space model. Obviously ranking is an evolving area and the big internet search engines are constantly evolving it. Another refinement that can be applied to search is the key constituent of spell-checking: fuzzy matching.

Fuzzy matching is another area of academic research with some established algorithms based on character overlap, edit distance and n-gram edit distance which may all be combined with prefix matching using a trie (prefix tree). The most important aspect of fuzzy matching to understand is that different algorithms will be more or less effective depending on the sort of information being matched, for example Movie Titles are best matched on Jaro-Winkler distance but Movie Actors are bet matched with a more exact algorithm given that they are used like brand names.

It can be useful to be able to extract people, places and things (including monetary amounts, dates, etc.). Again there are a number of algorithms for achieving this including open source implementations from the OpenNLP project. Machine learning can play a part provided there is plenty of tagged (training) examples available, which is especially useful where domain-specific text needs to be ‘understood’.

Given a large set of documents often there is a requirement to group similar documents. This is a process called clustering and can be observed in operation on news amalgamation sites. Note that clustering does not assign meaning to each cluster. There are a number of established algorithms, many of which are shared with other clustering problems. Given the large volumes and algorithm complexity a real-world clustering task is quite likely to want to make use of parallel processing and this is what the Carrot and Apache Mahout projects provide by building on top of Apache Hadoop.

Another activity for sets of documents is classification which is similar to clustering but starts with sets of documents that have been assigned to a pre-determined category by a human or other mechanism, for example asking users to tag articles. Example classification tasks are sentiment analysis or rating reviews as positive or negative. Of course there are a number of algorithms and implementations to choose from with each having trade-offs in accuracy and performance.

Many architects will be familiar with the failings of outsourcing and some will be familiar with a few successes. I have been able to observe two very similar organisations outsourcing to the same provider. One of these I would regard as a successful outsourcing and the other, I would say, is not. The two organisations, and the areas of business outsourced, are illustrated below:

To explain the diagram a little: ‘Product Administration’ in this instance refers to a financial services product like a pension or ISA; ‘Investment Administration’ deals with the underlying investments like mutual funds or equities.

So which is the successful outsourcing? Well its organisation B: the outsourced function is well defined by a number of business-level messages and there are a number of external organisations that could run the outsourced function. All great but the key advantage over A is that B is in full control of the customer experience. I would not consider A’s outsourcing to be successful because of the loss of control over the interaction with the customer. To change the customers experience A needed to go back to the outsourcer but, no surprise here, the outsourcer was busy chasing new business and A is somewhat stuck.

In their book Enterprise Architecture as Strategy Ross, Weil and Robertson refer to A’s outsourcing model as “Strategic Partnership” with a success rate of 50% and B’s outsourcing model as “Transaction” with a success rate of 90%. That is not to say one is better than the other but that is carries more risk, which given the prize might be worth taking. Architects need to recognise the risks in order to assign resources and mitigate as many risks as possible.

I recently had a conversation about Enterprise Architecture which went something along the lines of “how would you approach EA, if you come into a new organisation and there is nothing, no EA, no IT strategy, no documentation or other guidance”. Not having personally experienced this, or thought about this scenario, I was slightly stuck and replied along the lines of (1) understand business strategy and operating model, (2) high level documentation and assessment of current situation and identification of gaps (3) look to resolve gaps firstly through shaping any existing IT programmes… Back came a reply, “I disagree…”, followed by a logical and sensible explanation of how that individual had begun to bring in EA. In that explanation there was a great deal of context which made me realise that in all EA work I have ever approached there always has been, and it is this that will guide you in where to start. I think the contexts where EA work is requested, or identified as needed, are well summarised in the book Enterprise Architecture As Strategy [Ross, Weil & Robertson], they call them Symptoms but I like to think of them as bad Smells:

One Customer Question Elicits Different Answers: Most probably data duplication. Start with some high-level data modelling; a quick win, to build credibility, is to eliminate just one piece of data duplication; for the longer term identify all the core data that should be shared in order to drive programs of work to rationalise the estate and, of course, ensure good ongoing governance of data.

New Regulations Require Major Effort: In my experience the root cause is Different Business Processes and Systems Complete the Same Activity (see below)

IT Is Consistently a Bottleneck: I think this is tricky but my first suspicion is there is too much re-inventing (of systems and methodologies) going on and I would agree with Ross et al that the long-term approach is to introduce standardisation. Standardisation can be applied across methodologies, technologies and ultimately in the creation of generic solutions, which can be quickly re-used. It’s difficult to pick out a specific quick win but I would look to find something that can be re-used to get work underway faster than previously experienced. Adoption of SaaS and/or PaaS could be a fast-track mechanism for standardisation but there are many pros and cons to consider.

Different Business Processes and Systems Complete the Same Activity: I would start with a business capability model to understand the extent of the problem; a quick win here is not necessarily easy, it’s simple to say eliminate a duplicate system but hard to do; longer-term develop a plan that moves capabilities delivered by IT systems to more closely match the business capability model, in that each capability is implemented in, ideally only one, but practically as few as possible, systems. Of course governance needs to review proposals against the existing map of systems vs. business capability delivered to stop problems compounding or reoccurring.

Information for Making Decisions Is Not Available: more precisely it is not available at the right time. My starting point here is to examine the flows of data, is it being held up somewhere for example in overnight or weekly batches or in waiting for external data? A quick win should be straightforward using improved technical solutions in a targeted area and, longer term, adopting those improved solutions across the enterprise.

Employees Move Data from One System to Another: aka Swivel-Chair Integration. Again this is about data-flows but this time it’s the lack of automation. The approach is similar: first these manual flows need to be understood and then technology solutions implemented where cost-effective. I know it’s easy to say but often hard to achieve when data sits in silos: the organisation will need to change its operating model to mature its IT systems architecture.

Senior Managers Dreads Discussing IT Agenda Items and Management Doesn’t Know Whether It Gets Good Value from IT: I’ll admit I’m not sure where to start on these two. I suspect their cause is one or more of the previous bad smells, anyone care to enlighten me?

The above responses are only where you could start, EA should extend to all the disciplines mentioned on a prioritised basis: there will probably be more than one smell but which is worst?

Maybe very forward-thinking organisation recognise they need EA before they smell something bad and perhaps I’ll be lucky enough to work with one sometime!