“I’ve been accused of vulgarity. I say that’s bullshit.” – Mel Brooks

Being brutally honest, Big Data is bullshit. Not only is Big Data bullshit but it comes with a surfeit of pranksters, pundits and prissy big data bullshit-babblers, all willing (cue narration by Morgan Freeman) to “big-up Big Data in a vulgar, ill-mannered and predictably nauseating dance of professional-hustling… as old as time.”

However, is all Big Data bullshit? Is it all a fad, a load of old tripe and a confusion of weasels together with their surfeit of weasel words? Or, is there something of value, substance and tangibility to be found amongst the volumes, velocities and varieties of brazen and opportunistic self-aggrandizement, toxic speculation and opinions and unverifiable miracles?

For Google, Facebook and Twitter, Big Data certainly isn’t BS. For example, Google rely on Big Data as the biggest irreplaceable element in their colossal advertising business – so I am lead to believe. A business that accounts for more than 90% of Google’s revenue. So clearly, for the masters of web-based unstructured and complex search, Big Data is an essential element in their business model. The biggest essential element by far.

However, let´s be honest, we should consider the obvious. How many of us are really going to do business like Google?

Big Data technology and service vendors benefit tangibly from the Big Data movement, at least this is the impression that I get. Indeed, there is much talk about the relationship between Big Data, the Hadoop ecosphere and the big wild world of open source, but what is more interesting is that companies are bringing in revenue on the back of Big Data by offering battle-hardened business and enterprise versions of open source software. Then there is the business in consulting, with such a demand for Big Data gurus, Master Data Scientists and Number Conjurors, there must presumably be real people working in these roles, and paid handsomely for doing so.

But apart from the ‘success’ associated with the foundation of so many Big Data start-up businesses and the market-based commitment of some of ITs’ ‘great and good’ to the new digital zeitgeist of data volumes, velocities and varieties, just where are the other success stories of Big Data coming from?

To help us in our quest, I earlier compiled a not-so-exhaustive list of Big Data success stories, celebrity-chef like, to help us out. Here are some of the Big Data gems that I managed to track down:

Thanks to Big Data, the taxi service alternative channel Uber is making massive waves and shaking things up in the transport sector.

By leveraging Big Data AirBnB is turning the hospitality business on its head, and what´s more, making friends, and influencing people as they are doing so.

Amazon would not be what they are today if it were not for Big Data, in fact, without Big Data, they would be nothing.

One of the industries that will suffer revolutionary transformation because of Big Data will be the banking industry.

Big Data will increase the GDP of the USA by at least 1% or more, and the Spanish GDP could likewise add an additional 1%, for similar reasons.

These would be all great headlines for Big Data success stories, apart from one small flaw. None of them is exactly a Big Data success story in the Big Data defining characteristics of volumes, varieties and velocities of mainly unstructured data or in terms of the Hadoop technological kitchen-drawer ecosphere.

Something is happening, and it is not exactly legitimate. Can you guess what it is yet?

When it rains it pours, and when it rains Big Data hype it quickly turn into a monsoon of cloying hysteria. Spotting and pointing at Big Data bullshit babblers on forums like LinkedIn Pulse, Forbes and Information Management is no fun, unless your fun is nuking a school of intellectually challenged fish floundering in a barrel of vintage Malmsey.

However, it not only is no fun, but also more times than not it is a complete and utter waste of time trying to get people to adopt a more critical approach to thinking. Because for every Big Data bullshit babbler, there is a battalion of intransigent Big Data believers stuck in untenable and absurd positions, marooned from reason and ways back to rationality. You can’t use logic against belief, and you can´t turn back a rising tide of IT refugees who are desperately seeking succour in the apparently safer-havens of Big Data, Data Science and Data-driven voodoo.

Only the other day I read that “The emergence of Big Data is now allowing CEOs to increasingly base decisions on current “reality” rather than past experience, but the risks in the integrity and fullness of the data that they are “seeing” and “hearing” is often a barrier to getting a clear picture of what is actually going on.” This is really taking shameless baloney and wilful ignorance to all new heights, but it doesn’t stop there.

Elsewhere another eminent Big Data bullshit babbler wrote, “Clearly big data and AI will change almost every industry this decade… but none more than these”, referring vaguely and vacuously to “Healthcare, Finance and Insurance”. What species of shameless and fatuous willy-waving goes so far out on a limb that it becomes massively removed from even being a grandiose and beguiling ‘bigging-up’ of a fad?

Finally yet importantly, I almost choked on my supersized Big Data popcorn the other day when I read, “Today, with the rise of the Internet, we capture “data” on everything. Therefore, the new term “Big Data” is honestly like 1985 again. But this time, Big Data will actually be really big and by some forecasts, be a $40 billion industry by 2018.”

This is not hype, it is not even simple deceit, it is astroturfing of 22 carat bullshit, and in most cases it’s clearly deliberate, it´s intentional and it´s grossly misleading. So why do people do it?

Given that Big Data is very much a niche technology, with very much a niche appeal, why do so many buffoons go around pretending that Big Data is for all of us? Like as if it was some sort of digital universal-panacea, when at the moment, and at best, it is a walk on bit-player with just a couple of lies who aspires to B actor status. In this sense, at present Big Data isn´t even the hero´s best friend.

Before I close the piece, I will leave you with the thoughts of Dan Ariely. Why? Because it just irritates the hell out of a section of the community of Big Data bullshit babblers, and it´s actually very accurate. Here it is:

“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”

Many thanks for reading.

In subsequent blog pieces I will be sharing my views on the evolution of information management in general, and the incorporation novel and innovative techniques, technologies and methods into well architected mainstream information supply frameworks, for primarily strategic and tactical objectives.

As always, please reach out and share your questions, views and criticisms on this piece using the comment box below. I frequently write about strategy, organisational, leadership and information technology topics, trends and tendencies. You are more than welcome to keep up with my posts by clicking the ‘Follow’ link and perhaps you will even consider sending me a LinkedIn invite if you feel our data interests coincide. Also feel free to connect via Twitter, Facebook and the Cambriano Energy website.

“Contrary to slanderous Eastern opinion, much of Iowa is not flat, but rolling hills country with a lot of timber, a handsome and imaginative landscape, crowded with constant small changes of scene and full of little creeks winding with pools where shiners, crappies and catfish hover.”

Paul Engle

Catfish are said to be named because of their passing resemblance to land-roving felines. Admittedly, it’s not like any cat I’ve seen around the house, but if you simultaneously squint your eyes – impressionist style, guzzle a quart of bourbon and smoke a stash of ganja then maybe the resemblance becomes more obvious.

Catfish come in all sizes and varieties, at times they are native and other times they are classed as an alien species, rather like this Welshman who finds himself living in the Spain of Evo Morales, Kirchner and King Mohammed. Nonetheless, you won’t find many thrilling and delightful catfish videos on YouTube nor will you see many entered for the best of breed category at the International Cat Show.

So, what have catfish got to do with Big Data?

Well, there’s loads of them, they come in many varieties, and when they aren’t eating, they can be quite swift. But that’s not what I really wanted to discuss.

Now imagine this. Given the immense geographic dispersion, varieties and volumes of catfish around the world, wouldn’t it be interesting to carry out the Ma and Pa of all Big Data experiments?

We capture – over time of course, this is not the work of one day – all the catfish in the world, and we not only electronically tag them but we also fit them with IoT (Internet of Things) devices that will tell us:

Where the catfish is

Who the catfish is with

What are they doing

What are they eating

How do they feel in general

How do they feel about certain things, like the food they just ate, the company they keep, and what they do for entertainment and distraction, etc.

We could then collect this data, in centers all around the world, and then bring it all together in a massive Catfish Big Data Processing Centre in, for example, Coney Island.

Then the data we have so carefully collected, multiplied twice, and then searched and word-counted, in parallel, can be put to revolutionary, evolutionary and amazing uses such as:

Analysing and forecasting the Amazon buying trends of the lost Fukawi tribe – yes, the very same tribe who used to wander around boasting about their culture and presence usually accompanied with cries such as “We’re the Fukawi” or “Where the Fukawi?”

Predicting the outcome of the US Presidential election, the regional elections in Catalonia and the vote for Chairperson at the Hello Working Person’s Club, Hello Village, in Jolly Olde England.

Preventing the outbreak of a world-wide pandemic of universal proportions thanks to Big Data being used to intervene virus-bearing inter-terrestrial vehicles sent by radical-fundamentalist-Martians inhabiting the once munificent planet of Zog.

Providing a wealth of material success stories that can be liberally sprinkled like fairy-dust on amazing Big Data stories from the keyboards of some of the finest Big Data bullshit babbling princesses on the entire world wide webs.

Over time, the competence, repertoire and agility of Catfish of all varieties, species, volumes and velocities (did anyone mention Catfish voracity and veracity?) could be augmented, potentiated and expanded by invasive, elliptical and sublime manipulation and neuro-retraining. We could then start with in-aqua interactive stimulus, menu variation and programming and extra-sensory passivation. Later the experiments could be more complex and more all-inclusive, reaching greater and greater degrees of perfection and inclusivity and exclusivity as the Catfish Big Data bandwagon rolled on… Waterlogged, waylaid and none the wiser. Indeed, in the future, all individual decisions will also rely on Catfish input, insight and turbo-charged predictive analytics of great and lasting charm.

Diet manipulation, an habituation test, and chemical analysis of urinary free amino acids were used to demonstrate that bullhead catfish (Ictalurus nebulosus) naturally detect the body odors of conspecifics and respond to them in a predictable fashion. These signals are used in dominance and territorial relationships and lead to increased aggression toward chemical “strangers.” The results support the general notion that nonspecific metabolites, as well as specific pheromones, are important in chemical mediation of social behavior.

There is also one very important thing about catfish that not many people know – apart from Michael Caine, who of course is a leading authority on catfish – and not many people know that either. But, anyway… Catfish are also bottom feeders, this is because of some complex physiological configuration that I won’t go into here – for fear of hurting the sensibilities of the puerilely prudish and wasting valuable drinking time – so in terms of data, the Catfish are able to plumb the depths of the most obtuse, dark and murky data, gobble it up, transform it and… err… load it into Hadoop, to be analyzed with Spark and presented in Excel… or something like that.

So, you’re not convinced by this story? Okay, I didn’t want to tell you this, but here it goes…

Many of us worry about leveraging all data, and mainly we worry because we don’t really have a clue about what we are bullshitting about. We see Big Data, and we believe that is good, whether we know this to be true or not. We are grasping at straws like so many bottom feeders, so many feces-eating walking-catfish, motivated by ideas of maximizing the sale of useless and outdated crap to ignorant people who don’t need it and can’t derive any tangible benefit from it in the first place. This is the biggest takeaway from this current schizophrenic Big Data BS Kulturkampf. Beyond a limited set of interest stories and an even more limited set of peripheral benefits accruable in very specific circumstances, there is nothing tangible that really grabs the attention, apart from the razzle-dazzle, smoke and mirrors of vacuous cant dressed up as showmanship.

The biggest problem with Big Data isn’t so much the plethora of technology (which is more and more reminding me of box of half-eaten chocolates,) nor even the niche applications – for as miraculous and mysterious as most of them are. It’s more about Big Data being turned into a seriously creepy religion, where belief is paramount, and where there is little or no questioning of the tenets, the fables, the dogma and the liturgy, and where one person’s willful ignorance is just as valid as another person’s aspiration to gain knowledge and experience.

Make no mistake, Big Data can be useful for certain businesses and for certain situations. But for many of us in practice it’s either a peripheral player or doesn’t even make it to the bench.

A final thought. Treating Big Data as a religion is foolish, unhelpful and ultimately doomed to failure and ignominy. You have been warned!

For what it’s worth, I am currently writing the Ma and Pa of all Big Data parallel-analytics languages (details to follow), and I might even call it catfish (it’s sorta catchy) and I will have it represented by a muddy-looking open-source cartoon catfish, one worthy of a spot on YouTube.

To the layperson anxious for answers to complicated questions, the very idea of bringing together sets of disparate data and turning it into precious insights may seem like magic, a modern day alchemy, a goal placed well beyond the grasp of mere mortals. Fortunately, this is no longer the case, thanks in part to bagatelle-proportioned advances in Big Data and Big Data analytics and massive advances in imagination; we are able to look into the past, the present and the future, with absolute certainty. Continue reading →

As a child, I had a great love of stories of Spain, of the idea of travelling through the Iberian Peninsula and of mastering, and not just learning, the classical Spanish guitar. One of the phrases that stuck with me from those days was the in underivable quote of “amateurs practice until they get it right; professionals practice until they can’t get it wrong.”

In my professional working life, I have striven to identify those things that I want to be sufficiently competent at doing and those things that I consider a fundamental part of my professional competence, and then in making a clear distinction between the two.

As many of those who know me will know, a significant part of my professional life has been dedicated to the architecture and management of data, information and structured intellectual capital. Therefore, in the light of this fact and with reference to the previous bit of whimsical fancy, I will address the following question posed to me some time ago: What makes a great Data Architect, truly great?

What follows is by no means an exhaustive list of essential elements, but it should give you a flavour of what a great Data Architect is.

ONE – Establish a clear, cohesive and communicable idea of the theoretical, technical, philosophical and practical nature of data and information. Learn it inside, out, upside down and back to front. Then learn it well.

Put it another way. A great Data Architect should be able to answer the question “what is data?” from almost any viewpoint and then be able to give a simple, precise and understandable reply.

The internet abounds with content on ‘data’ and ‘information’. You may even be familiar with the way Wikipedia describes ‘data’, you may even agree with it, even though it is (in its current form – 3/8/2015[1]) a naïve, sloppy and circular definition. Which only serves as an example of how not to define data.

TWO – Know your audiences, understand their motivations, have empathy with them, and develop a keen ability to spot what the audience wants and then sell that back to them as if it were their own idea.

One of the greatest architects of the 20th century, Ludwig Mies van der Rohe, had this to say about the relationship between an architect and a client: “Never talk to a client about architecture. Talk to him about his children. That is simply good politics. He will not understand what you have to say about architecture most of the time. An architect of ability should be able to tell a client what he wants. Most of the time a client never knows what he wants.”

THREE – Learn to communicate clearly, simply and effectively, and remember who the most important members of your audience are at any given moment and speak mainly to them.

The mantra of ‘keep it simple’ is what separates a great Data Architect from the swathe of sycophantic worriers, software train-spotters and smart-ass wannabes that make up much of the world of IT. So do not even try to appeal to that segment, they don’t matter. Speak to those that do matter.

The job of the Data Architect is not to impress his colleagues, get likes on Facebook or to be the manager’s pet. A great Data Architect uses language that is appropriate for the occasion, not to flout their extensive knowledge and experience but to communicate ideas, concepts and architectures in the language and manner that the listener can immediately grasp. A Data Architect who aspires to greatness does not need to prove themselves to his or her peers, but just needs to strive to be a true professional and the greatness will come along in its own good time.

The eighteenth century English theologian, dissenter, philosopher and scientist Joseph Priestly wrote, “The more elaborate our means of communication, the less we communicate”. With such influences in mind I try to encourage my team members and other collaborators to use appropriate channels of communication, and one of the ways I use to this message across is with a list of options. I find that doing this early on can help to really simplify things and bring a greater degree of clarity to the table. However, as with many other aspects of life, with this approach one too has to be flexible and realistic, and allow for the election of the most appropriate option according to the circumstances. My preference list is:

Face-to-face

Video conference

Telephone/mobile

Post-it note – or similar

Texting/SMS/Wassup

Email

Smoke signals

Social Media

FOUR – Be a great listener. Data Architects must nurture and hone effective listening skills; otherwise, they place themselves at a serious disadvantage.

Here are the four listening aspects that a Data Architect should aspire to dominate:

Cultivate a self-awareness of the importance of listening.

Understand what barriers there are and learn how to overcome the barriers to listening.

Identify poor listening habits and practices that you have adopted – ask people about how they see your listening skills.

Improve your own responsive listening skills.

Take this as an open-ended continuous improvement programme.

Put it this way, as a leader you might be the most amazing talker this side of the Rockies, but if you can’t listen effectively then it would be like Nadal, Federer or Djokovic, having a great world-class tennis serve, but with a cultivated inability to accurately read the play or to return any difficult shot.

FIVE – Understand how data is generated; why it is generated; who or what triggers the generation; how it flows; how it is used; who uses it and why. Understand the life-cycles of data and information.

A great Data Architect must understand the public and private life of data before actually trying to do anything with it.

I’ll cut to the chase on this topic and leave you with a comment on The Social Life of Information by John Seely Brown and Paul Duguid.

“To see the future we can build with information technology, we must look beyond mere information to the social context that creates and gives meaning to it. For years, pundits have predicted that information technology will obliterate the need for almost everything—from travel to supermarkets to business organizations to social life itself. Individual users, however, tend to be more sceptical. Beaten down by info-glut and exasperated by computer systems fraught with software crashes, viruses, and unintelligible error messages, they find it hard to get a fix on the true potential of the digital revolution.”

That’s just another indication of what we have to learn to avoid.

On the up side, “The Social Life of Information gives us an optimistic look beyond the simplicities of information and individuals. It shows how a better understanding of the contribution that communities, organizations, and institutions make to learning, working and innovating can lead to the richest possible use of technology in our work and everyday lives.”

SIX – Get a great understanding of all the data oriented vices and bad data architecture practice that goes on in the IT application world, and most especially in the web application-development world.

Some of the most atrocious examples of bad data architecture, engineering and management are in web applications. Learn from them, and learn how not to repeat such gross and wilful incompetence in your own Data Architecture work. Look at it as extreme examples of lessons learned. I.e. How not to do it.

SEVEN – Cultivate a well-developed sixth sense for the appreciation of the intrinsic values of data and information.

No, I am not arguing the case for the idea that all data has value, that extreme notion is clearly absurd, but fortunately one that has limited adherence. However, I am saying that we should develop a ‘nose’ for understanding what data could be of value, and measuring in qualitative and pseudo-quantitative terms, what that value actually represents.

I would also encourage people to check-out the Wikipedia piece on Infonomics ( URL: https://en.wikipedia.org/wiki/Infonomics) a termed coined by Gartner’s Doug Laney, and based at work carried out at Bill Inmon’s Prism Solutions, which incidentally is one of my former employees.

Here’s a snippet:

“Infonomics is the theory, study and discipline of asserting economic significance to information. It provides the framework for businesses to value, manage and wield information as a real asset. Infonomics endeavors to apply both economic and asset management principles and practices to the valuation, handling and deployment of information assets.”

When you are a Data Architect, you should really be aware of such stuff, and at least be able to carry out a reasonable conversation about it.

EIGHT – Strive to be the best of all data modellers you are ever going to meet in your entire life.

I say that I’m a lean data modeller. What does that mean?

The first thing I model are the data flows.

Then I will create the conceptual, logical and physical models.

Then I will repeat until I get consensus, or until I become the Data Dictator – this usually occurs when the Portfolio Director demands closure and delivery.

Simples!

Nevertheless, not so fast. You will also need to know how to design physical data models for OLTP as well as for Enterprise Data Warehousing, and no, they are not the same, even if they are similar in many aspects.

Not only will a great Data Architect have polished skills in the art of data modelling according to the divine tenets of Codd and Date and later extended by blasphemers and acolytes alike, but they must also be comfortable designing dimensional models.

Some other models that will separate the competent from the great Data Architect would be working knowledge of the Hierarchical database model; the Network model; the Object model; the Document model; and the Entity–attribute–value model. It would also be of interests to have a passing acquaintance with the Inverted index; flat file usage; the Associative model; the Multidimensional model; the Multivalue model; the Semantic model; the XML database; the Named graph; and, Triplestore. Knowing stuff about stuff like this is where the killer skills differentiator comes into play.

I have been fortunate in that I can name some of the greatest data people of all times, as my own personal mentors, and I appreciate that for many, well everyone now, that this is not an option. However, there are ways and ways.

There is some great material out there about data modelling; unfortunately, there is an awful lot of crap as well. If you unsure how to differentiate, then ask an expert. There are a number of data experts commenting on the data related groups on LinkedIn.

In the old days it was quite easy to spot a data pro – slightly dishevelled look, tweed jacket, patches on the sleeves and a pipe, matches and tobacco in one of the pockets, Doctor Watson style, etc. but now in the virtual and aseptic worlds, it’s not so obvious who is who. What a pity those days have past, but such is life.

Lastly, consider this quote from Ove Arup. “Engineering is not a science. Science studies particular events to find general laws. Engineering design makes use of the laws to solve particular practical problems. In this it is more closely related to art or craft.”

NINE – Understand the database and data related technologies and products out there, and the pros and cons of using them. Also, strive to be technology agnostic.

This is probably the one aspect of the life of the Data Architect that most people will be familiar with… the tools and technologies. Probably for this reason alone there are recruiting agencies that cannot tell the difference between a technology product and the entire vast field of data architecture and management, or the differing importance of knowing the version of a piece of software and the knowing how to competently manage the Data Architecture of a global business.

Nonetheless, it’s good to have a grasp of the vast array of data related technologies and products out there, and to keep that knowledge as up to date as possible.

Therefore, this list is more for the aspiring Data Architect rather than for the experimented professional. Nevertheless, make sure you have a handle on these:

Please also note that there is a surfeit of data products in addition to those mentioned or referenced above.

TEN – Absolutely dominate the subject of Data Governance. Make Data Governance one of your master subjects, and be ready to bring it into play at a moment’s notice.

Take heed of the wise words of Sun Tzu: “If you know your enemies and know yourself, you will not be imperiled in a hundred battles… if you do not know your enemies nor yourself, you will be imperiled in every single battle.”

The DAMA Dictionary of Data Management defines Data Governance as “The exercise of authority, control and shared decision making (planning, monitoring and enforcement) over the management of data assets.” DAMA has identified 10 major functions of Data Management in the DAMA-DMBOK (Data Management Body of Knowledge). Data Governance is identified as the core component of Data Management, tying together the other 9 disciplines, such as Data Architecture Management, Data Quality Management, Reference & Master Data Management, etc., as shown in the following diagram:

Whilst we are at it, I would encourage everyone interested with a professional interest in Data Architecture to check out ‘Data Architecture: A Primer for the Data Scientist’. This is a bit of blurb from the Amazon site:
“Today, the world is trying to create and educate data scientists because of the phenomenon of Big Data. And everyone is looking deeply into this technology. But no one is looking at the larger architectural picture of how Big Data needs to fit within the existing systems (data warehousing systems). Taking a look at the larger picture into which Big Data fits gives the data scientist the necessary context for how pieces of the puzzle should fit together. Most references on Big Data look at only one tiny part of a much larger whole. Until data gathered can be put into an existing framework or architecture it can’t be used to its full potential. Data Architecture a Primer for the Data Scientist addresses the larger architectural picture of how Big Data fits with the existing information infrastructure, an essential topic for the data scientist.”

That’s all folks

Now, the clock on the wall is really telling me that I should wrap up this baby, warts and all, accidents, omission and typos included, and put it to bed.

This is far from being an exhaustive list of the things which a Data Architect should cultivate, hone and excel in. And yes, I know I ‘missed a bit, there’ as well. And yes, I know I started a new sentence with an ‘And’, and, and, and. And yes I… But, anyways… hey ho! upwards and onwards!

Nonetheless, I hope this little piece was informative or entertaining, or even both. At some level of abstraction or another.

If you spot any glaring errors in this piece then please let me know in the comments section below and I will revise as necessary. Thanks in advance for that.

I will leave you with the words of one of my favourite contemporary architects, Zaha Hadid**:

“I started out trying to create buildings that would sparkle like isolated jewels; now I want them to connect, to form a new kind of landscape, to flow together with contemporary cities and the lives of their peoples”

Many thanks for reading.

In subsequent blog pieces I will be sharing my views on the evolution of information management in general, and the incorporation novel and innovative techniques, technologies and methods into well architected mainstream information supply frameworks, for primarily strategic and tactical objectives.

As always, please reach out and share your questions, views and criticisms on this piece using the comment box below. I frequently write about strategy, organisational, leadership and information technology topics, trends and tendencies. You are more than welcome to keep up with my posts by clicking the ‘Follow’ link and perhaps you will even consider sending me a LinkedIn invite if you feel our data interests coincide. Also feel free to connect via Twitter, Facebook and the Cambriano Energy website.

As a child, I adored the USA rock band the Eagles, especially the musical talents of Joe Walsh. This explains the inspiration behind the title of this piece.

So, what’s going down at Ashley Madison?

Never heard of them? Off your radar? Surely not?

That stretches the bounds of incredulity. As even the people in Singapore’s Media Development Authority have heard of them. They even described their business site this way “it promotes adultery and disregards family values”, and subsequently will not allow them to operate in Singapore. Well, what a turn-up for the books.

On a more serious note, and as you might know, (from Wikipedia or some other ‘sites’,) Ashley Madison is a Canadian-based online dating service and social networking service marketed to people who are married or in a committed relationship. Its slogan is “Life is short. Have an affair.” It seems, if we are to believe various reports doing the rounds, that their Big Data has been compromised, big time.

Yes, I know, how could that possibly have happened, right?

According to some reports, Adison Mashley have around 37 million clients in the Big Data pool, and large caches of it have allegedly been stolen after an apparently successful hacking attempt was carried out. According to Krebs On Security, data stolen from the web site in question “have been posted online by an individual or group that claims to have completely compromised the company’s user databases, financial records and other proprietary information.”

But, again I ask, how can this happen?

I am not an avid fan of Big Data technology for core business use, and given the level of Big Data technology maturity, it sounds like a dopey idea. But each to their own.

What I will state is that my database management experience has tended to be associated with database technologies that can only be hacked as part of an inside job i.e. where people either know user IDs, passwords, IP addresses and layers of protection etc. or know of someone who does. Either someone who is a friend, part of the family (no, not that type of ‘family’) or someone who can be blackmailed into divulging the required access paths and security check workarounds.

However, taking a broader and more permissive view of this alleged hackerisation of Big Data, do we write it up as a Big Data success, i.e. The Amazing Big Data Affair? Put it down to a technical glitch and community faux pas? Or do we take a jaundiced view of the whole thing and keep it real? I await with baited breath for the enlightened opinions of the Big Data gurus.

Mitch ‘n’ Andy are not unfamiliar with ‘issues’ related to the use of people’s data. The Daily Dot carried a piece from contributing writer S. E. Smith with the headline ‘Why Ashley Madison is cheating on its users with Big Data’ in that piece, Smith states that “Like pretty much every other website on Earth, Ashley Madison spies on its users and crunches the data in a variety of ways to increase the bottom line.”

Belinda Luscombe writing in Time confirmed these suspicions with a piece titled ‘Cheaters’ Dating Site Ashley Madison Spied on Its Users’. She wrote:

In a study to be presented at the 109th Annual Meeting of the American Sociological Association in San Francisco on Saturday Aug. 16, Eric Anderson, a professor at the University of Winchester in England claims that women who seek extra-marital affairs usually still love their husbands and are cheating instead of divorcing, because they need more passion. “It is very clear that our model of having sex and love with just one other person for life has failed— and it has failed massively,” says Anderson.

“How does he know this? Because he spied on the conversations women were having on Ashley Madison, a website created for the purpose of having an affair. Professor Anderson, who as it turns out is a the “chief science officer” at Ashley Madison, looked at more than 4,000 conversations that 100 women were having with potential paramours. “I monitored their conversation with men on the website, without their knowing that I was monitoring and analyzing their conversations,” he says. “The men did not know either.”

Elsewhere, and as reported on Wikipedia, “Trish McDermott, a consultant who helped found Match.com, accused Ashley Madison of being a “business built on the back of broken hearts, ruined marriages, and damaged families.”

Wow, wow, and triple wow! What a way to run a dance hall!

Maybe they should reconsider their slogan, making it more snappy and apposite. How about “Life is short, we pimp your Big Data” as a starter? So go ahead, make your own and post it below. Have fun.

When it comes to Big Data, some people accuse me of being akin to a Luddite. Nothing could be further from the truth. Not that the facts matter. In the age of superficiality and surfaces there is as much wilfully cultivated obliviousness as there is unashamed and unabashed term abuse. Add the prevailing underlying current of anti-intellectualism into the mix, and we have an explosive combination that manifests itself in the alliterative combination of bluff, bluster and banality.

I was reticent about writing this article, because it’s a bit like arguing against the irrational, self-interested and wilfully obtuse. Or as Ben Goldacre would have it, “You cannot reason people out of a position that they did not reason themselves into.” Therefore, a lot of care needed to be exercised. Indeed, Mark Twain once stated, “Never argue with stupid people, they will drag you down to their level and then beat you with experience.” Now, I wouldn’t go that far, and I do try to be nicely diplomatic, most of the time, but I can see where he was coming from.

Anyway, without more ado let’s get a handle on what a Luddite is, in terms I hope that most will understand.

According to Wikipedia (yes, I know) The Luddites were:

“19th-century English textile workers who protested against newly developed labour-economizing technologies from 1811 to 1816. The stocking frames, spinning frames and power looms introduced during the Industrial Revolution threatened to replace the artisans with less-skilled, low-wage labourers, leaving them without work.”

So why do I get a feeling that some people think that I am a Big Data Luddite?

“With all due respect – your post does sound a little like what I could envisage an exchange between a man riding a horse and a man driving one if the first automobiles….sorry.”

Although a respectable knowledge of the technology and its evolution would inform otherwise, I assume that this means that I would be the “man riding a horse”… An interesting piece of conjecture indeed, even if hat in lacks in accuracy is made up for by the inexplicable certainty of belief. Still, it’s fascinating to discover just how many ‘experts’ think that this stuff – the sort of stuff I was doing in the mid to late eighties at Sperry and later Unisys – is bleeding edge innovation,

“Yes, cynical indeed… here is another amazing Big Data success story. You go on your computer, type in any search phrase and get instantaneous and highly relevant results. It is so amazing that a word has been coined. Guess what that is…”

What to say? There goes a person who seems to believe that the history of search starts and ends with the Google web search engine. Something slightly less than a munificently inapposite comment, only outdone by its tragically disconnected banality.

More recently, Bernice Blaar had this to say about my take on Big Data in general and The Big Data Contrarians in particular.

“Master Jones may well be the great and ethical strategy data architecture and management guru that the chattering-class Guardian-reading wine-sipping luvvies drool over, but he is also a brazen Big Data Luddite. No, actually far worse than a Luddite, he`s a Neddite, because with his ‘facts’ and ‘logic’ (what a laugh, you can prove anything with facts, can’t you [tou}???) he is undermining the very foundation of the Big Data work, shirk and skive ethic that has been so hardily fought for by the likes of self-sacrificing champions and evangelists of the Big Data revolution, to wit, such as those bold, proud and fine upstanding members Bernard Marr, Martin Fowler and Tom Davenport, for example, and the brave sycophants that worship at their feet. Martyn is worse than Bob Hoffman, Dave Trott, Jeremy Hardy, Mark Steel, Tab C Nesbitt and Bill Inmon, all rolled into one. He may be a great strategist, but I wouldn’t hire him. Contrarian Luddite!”

And then followed it up with this broadside:

“The Big Data Contrarians group are nothing more than a bunch of over-educated clown-shoes who are trying to scupper the hard-work of decent people out to earn a crust from leveraging the promise of a bright future. In a decent society of capital and consumers, they would be banned off the face of the internets.”

How does one reciprocate such flattering flatulence? How can one possible respond to such a long concatenation of meaningless clichés? Though to be fair, I quite liked being referred to as a Neddite, whatever that is.

Anyway, to set the record straight, this is where I stand.

A contrarian is a person who takes up a contrary position, especially a position that is opposed to that of the majority, regardless of how unpopular it may be.

Like others, I am a Big Data Contrarian, not because I am contrary to the effective use of large volumes, varieties and velocities of data, but because I am contrary to the vast quantities of hype, disinformation and biased mendaciousness surrounding aspects of Big Data and some of the attendant technologies and service providers that go with the terrain. I don’t mind people guilding the lily (to use an English aphorism for exaggeration), but I do draw the line at straight out deception., which could lead to unintended consequences, such as creating false expectations, diverting scarce resources to wasteful projects or doing people out of a livelihood. That’s just not tight.

Does that make me a Luddite (or a Neddite)? I don’t think so, but do make sure that your opinion is your own and is arrived at through reason, not some other persons bullying hype. As I wrote elsewhere some moments ago “If you have to lie like an ethically challenged weasel to sell Big Data then clearly there is something amiss.”

As always I would love to hear your opinions and comments on this subject and others, and also please feel free to reach out and connect, so we can keep the conversation going, here on LinkedIn or elsewhere (such as Twitter).

Okay, before we get started I have to declare the real intent for posting this piece. It is to get you to join The Big Data Contrarians professional group here on LinkedIn.

To apply to join the best Big Data community on the web simply navigate to this address http://www.linkedin.com/grp/home?gid=8338976 (or paste it into your browser) and request membership, the process is quick and painless and well worth the effort.

Now for the rest of the news…

There are many common misconceptions amongst the Big Data collective about Data Warehousing. There are common fallacies that need clearing up in order avoid unnecessary confusion, avoidable risks and the damaging perpetuation of disinformation.

Big Picture

In the dim and distant past of business IT, the best information that senior executives could expect from their computer systems were operational reports typically indicating what went right or wrong or somewhere in between. Applied statistical brilliance made up for what data processing lacked in processing power, up to a point, because even heavy lifting statistics requires computing horsepower, which in those days was really a question of serious capital expenditure, which not all companies were willing to commit to.

Then, and curiously coincidentally, people around the world started to posit the need for using data and information to address significant business challenges, to act as input into the processes of strategy formulation, choice and execution. Reports would no longer just be for the Financial Directors or the paper collectors, but would support serious business decision making.

Many initiatives sprang up to meet the top-level decision-making data requirements; they were invariably expensive attempts, with variable outcomes. Some approaches were quite successful, but far too many failed, until the advent of Data Warehousing.

Back then, most of the data that could potentially aid decision-making was in operational systems. Both an advantage and a problem. Data in operational systems was like having data in gaol. Getting data into operational systems was relatively easy, getting it out and moving it around was a nightmare. However, one of the advantages of operational data is that it was generally stored in a structured format, even if data quality was frequently of a dubious nature, and ideas such as subject orientation and integration were far from being widespread.

Of course, data also came in from external sources, but usually via operational databases as well. An example of such data is instrument pricing in financial services.

Therefore, briefly, a lot of Data Warehousing started as a means to provide data to support strategic decision-making. Data Warehousing ways not about counting cakes, widgets or people, which was the purview of operational reporting, or to measure sentiment, likes or mouse behaviour, but to assist senior executives, address the significant business challenges of the day.

Who’s your Daddy?

Bill Inmon, the father of Data Warehousing, defines it as being “a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.”

Subject Oriented: The data in the Data Warehouse is organised conceptually (the big canvas), logically (detailing the big picture and) and physically (detailing how it is implemented) by subjects of interest to the business, such as customer and product.

The thing to remember about subject areas is that they are not created ad-hoc by IT according to the sentiments of the time, e.g. during requirements gathering, but through a deeper understanding of the business, its processes and its pertinent business subject areas.

Integrated: All data entering the data warehouse is subject to normalisation and integration rules and constraints to ensure that the data stored is consistently and contextually unambiguous.

Time Variant: Time variance gives us the ability to view and contrast data from multiple viewpoints over time. It is an essential element in the organisation of data within the data warehouse and dependent data marts.

Non-Volatile: The data warehouse represents structured and consistent snapshots of business data over time. Once a data snapshot is established, it is rarely if ever modified.

Management Decision Making: This is the principal focus of Data Warehousing, although Data Warehouses have secondary uses, such as complementing operational reporting and analysis.

In plain language, if what your business has or is planning to have does not fully satisfy the Inmon criteria then it probably is not a Data Warehouse, but another form of data-store.

The thing to remember about informed management decision making is that it needs to be as good as required but it does not need to achieve technical perfection. This observation underlies the fact that Data Warehouse is a business process, and not an obsessive search for zero defects or the application of so called ‘leading edge’ technologies – faddish, appropriate or not.

Some Basic Terms

Before we delve into the meaning of Data Warehousing, there are a couple of terms that need to be understood first, so, by way of illustration:

Let’s follow the numbers in the simplification of the process.

We gather specific and well-bound data requirements from a specific business area. These are requirements by talking to business people and in understanding their requirements from a business as well as a data sourcing and data logistics perspective. Here we must remember at all times not to over-promise or to set expectations too high. Be modest.

These business requirements are typically captured in a dimensional data model and supporting documentation. Remember that all requirements are subject to revision at a later data, usually in a subsequent iteration of a requirements gathering to implementation cycle.

We identify the best source(s) for the required data and we record basic technical, management and quality details. We ensure that we can provide data to the quality required. Note that data quality does not mean perfection but data to the required quality tolerance levels.

Data Warehouse data models modified as required to accommodate any new data at the atomic level.

We define, document and produce the means (ETL) for getting data from the source and into the target Data Warehouse. Here we also pay especial attention to the four characteristics of Data Warehousing. ETL is an acronym for Extract (the data from source / staging), Transform (the data, making it subject oriented, integrated, and time-variant) and Load (the data into the Data Warehouse and Data Mart).

We define, document and produce the means for getting data from the Data Warehouse into the Data Mart. In short, a bit more ETL.

User acceptance testing. NB Users must ideally be involved in all parts of the end-to-end process that involves business requirements, participation and validation.

This is a very simplified view, but it serves to convey the fundamental chain of events. The most important aspect being that we start (1) and end (7) with the user, and we fully involve them in the non-technical aspects of the process.

Business, Enterprise and Technology

Essentially, a Data Warehouse is a business driven, enterprise centric and technology based solution for continual quality improvement in the sourcing, integration, packaging and delivery of data for strategic, tactical and operational modelling, reporting, visualisation and decision-making.

Business Driven

A data warehouse is business centric and nothing happens unless there is a business imperative for doing so. This means that there is no second-guessing the data requirements of the business users, and every piece of data in the data warehouse should be traceable to a tangible business requirement. This tangible business requirement is usually a departmental or process specific dimensional data model produced together in requirements workshops with the business. We build the Data Warehouse over time in iterative steps, based on the criteria that the requirements should be small enough to be delivered in a short timeframe and large enough to be significant.

Typically, a Data Warehouse iteration results in a new Data Mart or the revision of an existing Data Mart.

Enterprise Centric

As we build up the collection of Data Marts, we are also building up the central logical store of data known as the Enterprise Data Warehouse that serves as a structured, coherent and cohesive central clearing area for data that supports enterprise decision making. Therefore, whilst we are addressing specific departmental and process requirements through Data Marts we are also building up an overall view of the enterprise data.

Technology Based

By technology, I mean technology in the broadest sense of techniques, methods, processes and tools, and not just a question of products, brands or badges.

Unfortunately, there is a popular misconception that Data Warehousing is primarily about competing popular and commercial available technology products. It isn’t, but they do play an important role.

Architecture

The following is an example of a very high-level Data Warehouse architecture diagram.

Methodologies

Various methodologies support the building, expansion and maintenance of a Data Warehouse. Here is one example of a professional data integration methodology, produced, maintained and used by Cambriano Energy.

And here is an information value-chain map as used by Cambriano Energy as part of its Iter8 process management. There are alternatives, many of which do a satisfactory job.

Last but not least, this was (from memory) the way that Bill Inmon’s Prism Solutions ETL company used to view the iterative EDW building process.

Keeping it Shortish

At this point, I decided to cut short further explanations on aspects on Data Warehousing. However, if you have any question then please address them to me and I will do my best (or something close) to answer them.

That’s all folks

Hold this thought for another time: If you think you can replace a Data Warehouse, that is not a Data Warehouse, with another approach to ‘Data Warehousing’ that doesn’t produce a Data Warehouse, for as fast and cheap as one can do it, then you still don’t have a Data Warehouse to show for all of your efforts. That is not a great place to be.

Therefore, you see, Data Warehousing was never about a haphazard approach to providing random structured, semi-structured and unstructured data of various qualities, provenance, volumes, varieties and velocities, to whomever was of a mind to want it.

Many thanks for reading.

If you want to connect then please send a request. I you have any questions or comments then fire them off below. Cheers 🙂

Friends, peers and colleagues, lend me your bandwidth and 10 minutes of your time. Gather around and let me tell you about the greatest, most interesting and fantastically diverse Big Data and Data community right here in our very midst on this amazing LinkedIn community.

We have a new Big Data/Data group, and the group is aptly named The Big Data Contrarians, and yet it is neither a ‘me too’ group, of which there are too many to mention, or a ‘belief circle’, of which the less said, the better. Not, The Big Data Contrarians group is a place for cool opinion pieces, creative abrasion, practical insight and (within the realms of the possible) BS free comment.

However, before going into more detail about the group, I would like to digress for a moment.

Like many people, I take a lot of inspiration from outside my own professional spheres of practice, principles and technologies, and this is no less true when it comes to advertising.

Two of my real influencers – the real kind not the LinkedIn kind – are advertising legends Dave Trott (also author of Predatory Thinking) and Bob Hoffman (the Ad Contrarian), who are exceptionally experienced, talented and creative people, of the NoBS (no flim-flam) kind. Indeed, it was after reading some of Bob’s and Dave’s recent articles that I decided to get this group registered on LinkedIn, which, love it or loath it, is where many of us connect.

So, I hear you ask “What’s The Big Data Contrarians, Mart?”

Okay, to be fair, The Big Data Contrarians group is about far more than just being contrarian and a legitimate means of inciting discussion, for as reasonable as that is. It’s also about arguing against or openly rejecting mistakenly cherished and contrived Big Data beliefs and ‘institutions’ and established Big Data hype, speculation and opinion. It’s about separating Big Data fads, fantasises and folk-tales from Big Data reality.

What we seek to understand and convey is where, when, how and for what ends data (including Big Data) can be used to derive legitimate benefits. Moreover, stated from a position of reason and facts, and not simply projected as an issue of Big Data faith, speculation and clairvoyance.

On the other side, we can call out the Big Data hype for what it is, and just as Bob Hoffman calls out the social media and advertising BS babblers in his trade, this too lends a platform for people to do the same with the disreputable and dubious practices of Data gurus, courtesans and ‘influencers’.

“So, Mart, is being a Big Data Contrarian a bit like being a Big Data Luddite?”

Well, not really, but the problem with having so many people who are new to IT is that the past is a mystery top them, so anything that is new to them is actually taken as new, whether it is new or not.

Those who know will know that technologies of distributed file stores and search over unstructured data has been around for quite some time, and some of the “new” technologies that we big-up today, are actually simple developments of data technologies that go back to the seventies and eighties, or maybe even before.

However, this is not essentially about being anti-technology or even in advances in the application of technology, but of understanding that it isn’t helpful for the media, the big industry players and their indentured acolytes, to railroad, cajole and bully businesses into buying Big Data technology they don’t need, to solve Big Data problems and opportunities they don’t have.

That said, it’s up to the members of The Big Data Contrarians to decide on what shape the community should take, and as it is an open forum in democratic terms, the members have equal rights in presenting their own opinions, lessons learned and other insights.

So, if you haven’t yet drunk the Big Data kool-aid, come on down to The Big Data Contrarians, the place for everyone interested in Big Data/Data and its many potential uses.

Many thanks for reading.

Of course, this piece will also not feature on LinkedIn’s Big Data channel, because apparently that channel editor (naming no names) doesn’t like anyone raining on their particular Big Data flim-flam parade.