Monday, November 14, 2011

Back in 2006 I wrote a bunch of SOA Anti-patterns, with some additional help, and these are pretty much as valid then as now. I'd like to add a new one though that I've seen over and over again its related to the canonical model problem and its pretty easy to solve.

Name: Sharing Data like candyDescription
This anti-pattern comes from a desire to 'share' all the information between areas but instead ends up creating strong dependencies between areas because core information sets, for instance customers, products, locations, etc are thrown about all the time and need to be 'matched' across systems during the transactions. The information description for these core entities grows continually as different services want 'just one more field' in order to enable other services to get 'everything'.

Effect
The impact of this approach is simple, the schemas for the core entities aim to be 'complete' so all information from every services can be placed within these, this is stated that it aids 'sharing' but the reality is that as new attributes are added, or sub-entities, that different areas of the business begin to add the same attributes in different places and the testing impact of change becomes a significant issue as these core information sets are used in every single interaction. Services take to storing information from these interactions 'just in case' which leads to significant challenges of data duplication. The solution degrades into a 'fragile base class' issue and modifying the information model becomes a high ceremony, high impact, change. Development becomes slow, people look for 'back doors' to avoid using corporate approaches and local point solutions begin to proliferate.

Cause
The cause is simple, a failure to recognise what the purpose of the interaction is. If service A and service B both have information on customer X, with them being known as "fred bloggs, ID:123" by service A and in service B as "mr fredrick bloggs, ID:ABC" then passing 40 attributes about the customer is just designed to create confusion. It doesn't matter if Sales and Finance have different information about a customer as long as they have the information that is right for their interaction with them. Sharing all this information makes communication worse if its done in an unmanaged way. The problem here is that management of these core entities is a task in itself while the SOA effort has viewed them as being just another set of data entities to be flung around. The cause is also due to the SOA effort viewing everything as a transactional interaction and not thinking about the information lifecycle.

Resolution
What we need is the ability to exchange the mapping from service A to service B's ID and only the information which is specific to the transaction. This means that we need a central place which maps the IDs from one service to another and ensures that these represent a single valid individual, product, location, etc. This is the job of Master Data Management and it comes with a big set of business governance requirements. This approach has an added benefit of being able to synchronise the information between systems so people don't see different versions of the same data attributes and being able to extract information out of source systems to provide a single, transactional, source.

The resolution therefore is to agree as a business that these core entities are important and to start managing them, this reduces the complexity of the SOA environment, increases its flexibility and agility as well as removing issues of data duplication.

Wednesday, November 09, 2011

Single processing is an old school idea - this was pointing out the obvious in 2007 and indeed obvious in 1990 and before if you know anything about decent scale systems. This was a prediction of the future being the same as the present and recent past - not so much a prediction as a statement

The end of low-level programming languages - as above, this wasn't a prediction but a statement of current reality dressed up as a prediction. The line "Once considered an extravagant use of memory, compilers are now essential tools" is brilliant. WTF is Node.js in this world? So wrong its gone around the other side....

Virtual Memory is dead in next generation OSes.... umm so that would exclude Windows 7 and Windows 8 then.... clearly Microsoft don't consider them next generation

You'll be carrying around all the personal storage you need for video and audio... my 4 TB of video would disagree

Next generations OSes will use DB technologies not file systems .... again a world of MS #fail on this one

Now the first 2 were predictions of 'yesterday' being put forward as the future, 3 was a lack of vision as to what virtual memory is actually for... but the real point here is that later 2.

This was 2007 remember. A year in which I was talking about Google SaaS and Amazon AWS to lots of folks and here we have someone who was a research leader at Microsoft really missing the point of the next wave, not a 10 year+ wave but the wave that was about to crash across the entire company. 4 has been replaced by iCloud and other approaches that mean you don't need to carry it all with you (and with video and TV you probably couldn't anyway) but instead can access it on demand via mobile broadband. The final point is really just that Windows Vista should have had that DB technology but didn't and you know what? None of us are missing it. Spotlight on Mac OS X and the new Windows 7 search stuff means we don't need that DB approach and out in the real-world we are seeing people using NoSQL approaches over traditional DBs.

My point here is simple. Here was someone at the top of the research tree in one of the biggest tech companies in the world and his 5 'top' predictions were all total bobbins. So what does that mean for the rest of us? Well first of all it means listen to the shiny fringe and read about the leading practice of the past. Secondly it means don't listen to the 'vision' of companies with a bought in objective of extending the present. Thirdly it shows that missing the wave costs a lot of money to catch-up and profitability becomes an issue (see:Bing, Windows Mobile, etc)

Above all it means challenging visions, and then measuring companies against them.

I'm seeing a lot of 'Big Data' washing going on in the market. Some companies are looking at this volume explosion as part of a continuation of history, new technologies, new approaches but evolution not revolution. Yes Map Reduce is cool but its technically much harder than SQL and database design this means that it is far from a business panacea. Yes the link between structured and unstructured data is rising and the ability of processing power to cut up things like video and audio has never been better. But seriously lets step back.

Back in 2000 I worked at a place that spent literally MILLIONS on an EMC 5 TB disk set-up. Yes it had geographical redundancy etc etc and back then 5TB was seen as a stratospheric amount of data for most businesses. These days its the sort of thing we'd look to put into SSDs, its a bit beyond what people would do in straight RAM but give it a few years and we'll be doing that anyway.

Here is the point about Big Data: 95%+ of it is just about the on-going exponential increase in data which is matched, or at least tracked, by the increase in processing power and storage volumes. Things like Teradata and Exadata (nice gag there Larry) are set up to handle this sort of volume out of the box and Yahoo apparently modified postgres to handle two PetaBytes which by anyones definition is 'big'. Yes index tuning might be harder and yes you might shift stuff around onto SSDs but seriously this is just 'bigger' its not a fundamental shift.

Map Reduce is different because its a different way of thinking about data, querying data and manipulating data. This makes it 'hard' for most IT estates as they aren't good at thinking in new ways and don't have the people who can do that. In the same way as there aren't that many people who can properly think multi-threaded then there aren't that many people who can think Map Reduce. Before you leap up and go 'I get it' do two things 1) compare two disparate data sets 2) Think how many people in your office could do it.

So what do we see in the market? We see people using Big Data in the same way they used SOA, slapping on a logo and saying things like 'Hadoop integration' or 'Social media integration' or.... to put it another way.... 'we've built a connector'. See how much less impressive the later looks? Its just an old school EAI connector to a new source or a new ETL connector... WOW hold the front-page.

Big Data has issues of Data Gravity, process movement and lots of other very complex things. So to find out whether its Big Data or Big Con ask the following

Can you replace the phrase 'Big Data' with 'Big Database' if you can then its just an upgrade

Do they have things that mean old school DBAs et al can handle Hadoop?

Can the 'advance' be reduced to 'we've got an EAI connector'

Is it basically the same product as 2009 with a sticker on it?

Is there anything that solves the data gravity problem?

Is there anything that moves process to data rather than shifting the data?

"Space," it says, "is big. Really big. You just won't believe how vastly, hugely, mindbogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space, listen...

Then you really know its Big Con. Big Data is evolution not revolution and pretending otherwise doesn't help anyone.

Tuesday, November 08, 2011

As everyone knows 'Soylent Green is People' and indeed its taken nearly 30 years for people to really make a product whose only ability is to sell the value of people, people cut into cross sections, relationships and information. This is what all social media companies are really selling, they aren't selling TO people they are selling people TO companies. This is the only way they make money. People talk about privacy concerns in Social Media but in reality its a balance of how much can they get people to give away.

In other words the goal of Google+, Facebook etc are to make people willingly become products they can sell. When you opt-in to marketing or 'like' something on facebook your are making that decision and that commitment.

Its 2011 and Soylent Green is the bigger product buzz on the market, its just that as with every relaunch after a faux-pas its been rebranded and renamed (to 'Social Media') but the goal is the same, to provide people, and companies, with people as a product. This desire to monetise people is only going to get more direct and more visible, it builds on the customer marketing databases of the past but adds a more direct engagement, instead of selling a contact point you really are selling the actual individual.

The question isn't 'is this right or wrong', its just reality. The product of social media is the people who use it. Social media companies aren't a charity, they want to make money, and in order to make money they either have to charge users or sell those users.