To the layperson anxious for answers to complicated questions, the very idea of bringing together sets of disparate data and turning it into precious insights may seem like magic, a modern day alchemy, a goal placed well beyond the grasp of mere mortals. Fortunately, this is no longer the case, thanks in part to bagatelle-proportioned advances in Big Data and Big Data analytics and massive advances in imagination; we are able to look into the past, the present and the future, with absolute certainty. Continue reading →

As a child, I had a great love of stories of Spain, of the idea of travelling through the Iberian Peninsula and of mastering, and not just learning, the classical Spanish guitar. One of the phrases that stuck with me from those days was the in underivable quote of “amateurs practice until they get it right; professionals practice until they can’t get it wrong.”

In my professional working life, I have striven to identify those things that I want to be sufficiently competent at doing and those things that I consider a fundamental part of my professional competence, and then in making a clear distinction between the two.

As many of those who know me will know, a significant part of my professional life has been dedicated to the architecture and management of data, information and structured intellectual capital. Therefore, in the light of this fact and with reference to the previous bit of whimsical fancy, I will address the following question posed to me some time ago: What makes a great Data Architect, truly great?

What follows is by no means an exhaustive list of essential elements, but it should give you a flavour of what a great Data Architect is.

ONE – Establish a clear, cohesive and communicable idea of the theoretical, technical, philosophical and practical nature of data and information. Learn it inside, out, upside down and back to front. Then learn it well.

Put it another way. A great Data Architect should be able to answer the question “what is data?” from almost any viewpoint and then be able to give a simple, precise and understandable reply.

The internet abounds with content on ‘data’ and ‘information’. You may even be familiar with the way Wikipedia describes ‘data’, you may even agree with it, even though it is (in its current form – 3/8/2015[1]) a naïve, sloppy and circular definition. Which only serves as an example of how not to define data.

TWO – Know your audiences, understand their motivations, have empathy with them, and develop a keen ability to spot what the audience wants and then sell that back to them as if it were their own idea.

One of the greatest architects of the 20th century, Ludwig Mies van der Rohe, had this to say about the relationship between an architect and a client: “Never talk to a client about architecture. Talk to him about his children. That is simply good politics. He will not understand what you have to say about architecture most of the time. An architect of ability should be able to tell a client what he wants. Most of the time a client never knows what he wants.”

THREE – Learn to communicate clearly, simply and effectively, and remember who the most important members of your audience are at any given moment and speak mainly to them.

The mantra of ‘keep it simple’ is what separates a great Data Architect from the swathe of sycophantic worriers, software train-spotters and smart-ass wannabes that make up much of the world of IT. So do not even try to appeal to that segment, they don’t matter. Speak to those that do matter.

The job of the Data Architect is not to impress his colleagues, get likes on Facebook or to be the manager’s pet. A great Data Architect uses language that is appropriate for the occasion, not to flout their extensive knowledge and experience but to communicate ideas, concepts and architectures in the language and manner that the listener can immediately grasp. A Data Architect who aspires to greatness does not need to prove themselves to his or her peers, but just needs to strive to be a true professional and the greatness will come along in its own good time.

The eighteenth century English theologian, dissenter, philosopher and scientist Joseph Priestly wrote, “The more elaborate our means of communication, the less we communicate”. With such influences in mind I try to encourage my team members and other collaborators to use appropriate channels of communication, and one of the ways I use to this message across is with a list of options. I find that doing this early on can help to really simplify things and bring a greater degree of clarity to the table. However, as with many other aspects of life, with this approach one too has to be flexible and realistic, and allow for the election of the most appropriate option according to the circumstances. My preference list is:

Face-to-face

Video conference

Telephone/mobile

Post-it note – or similar

Texting/SMS/Wassup

Email

Smoke signals

Social Media

FOUR – Be a great listener. Data Architects must nurture and hone effective listening skills; otherwise, they place themselves at a serious disadvantage.

Here are the four listening aspects that a Data Architect should aspire to dominate:

Cultivate a self-awareness of the importance of listening.

Understand what barriers there are and learn how to overcome the barriers to listening.

Identify poor listening habits and practices that you have adopted – ask people about how they see your listening skills.

Improve your own responsive listening skills.

Take this as an open-ended continuous improvement programme.

Put it this way, as a leader you might be the most amazing talker this side of the Rockies, but if you can’t listen effectively then it would be like Nadal, Federer or Djokovic, having a great world-class tennis serve, but with a cultivated inability to accurately read the play or to return any difficult shot.

FIVE – Understand how data is generated; why it is generated; who or what triggers the generation; how it flows; how it is used; who uses it and why. Understand the life-cycles of data and information.

A great Data Architect must understand the public and private life of data before actually trying to do anything with it.

I’ll cut to the chase on this topic and leave you with a comment on The Social Life of Information by John Seely Brown and Paul Duguid.

“To see the future we can build with information technology, we must look beyond mere information to the social context that creates and gives meaning to it. For years, pundits have predicted that information technology will obliterate the need for almost everything—from travel to supermarkets to business organizations to social life itself. Individual users, however, tend to be more sceptical. Beaten down by info-glut and exasperated by computer systems fraught with software crashes, viruses, and unintelligible error messages, they find it hard to get a fix on the true potential of the digital revolution.”

That’s just another indication of what we have to learn to avoid.

On the up side, “The Social Life of Information gives us an optimistic look beyond the simplicities of information and individuals. It shows how a better understanding of the contribution that communities, organizations, and institutions make to learning, working and innovating can lead to the richest possible use of technology in our work and everyday lives.”

SIX – Get a great understanding of all the data oriented vices and bad data architecture practice that goes on in the IT application world, and most especially in the web application-development world.

Some of the most atrocious examples of bad data architecture, engineering and management are in web applications. Learn from them, and learn how not to repeat such gross and wilful incompetence in your own Data Architecture work. Look at it as extreme examples of lessons learned. I.e. How not to do it.

SEVEN – Cultivate a well-developed sixth sense for the appreciation of the intrinsic values of data and information.

No, I am not arguing the case for the idea that all data has value, that extreme notion is clearly absurd, but fortunately one that has limited adherence. However, I am saying that we should develop a ‘nose’ for understanding what data could be of value, and measuring in qualitative and pseudo-quantitative terms, what that value actually represents.

I would also encourage people to check-out the Wikipedia piece on Infonomics ( URL: https://en.wikipedia.org/wiki/Infonomics) a termed coined by Gartner’s Doug Laney, and based at work carried out at Bill Inmon’s Prism Solutions, which incidentally is one of my former employees.

Here’s a snippet:

“Infonomics is the theory, study and discipline of asserting economic significance to information. It provides the framework for businesses to value, manage and wield information as a real asset. Infonomics endeavors to apply both economic and asset management principles and practices to the valuation, handling and deployment of information assets.”

When you are a Data Architect, you should really be aware of such stuff, and at least be able to carry out a reasonable conversation about it.

EIGHT – Strive to be the best of all data modellers you are ever going to meet in your entire life.

I say that I’m a lean data modeller. What does that mean?

The first thing I model are the data flows.

Then I will create the conceptual, logical and physical models.

Then I will repeat until I get consensus, or until I become the Data Dictator – this usually occurs when the Portfolio Director demands closure and delivery.

Simples!

Nevertheless, not so fast. You will also need to know how to design physical data models for OLTP as well as for Enterprise Data Warehousing, and no, they are not the same, even if they are similar in many aspects.

Not only will a great Data Architect have polished skills in the art of data modelling according to the divine tenets of Codd and Date and later extended by blasphemers and acolytes alike, but they must also be comfortable designing dimensional models.

Some other models that will separate the competent from the great Data Architect would be working knowledge of the Hierarchical database model; the Network model; the Object model; the Document model; and the Entity–attribute–value model. It would also be of interests to have a passing acquaintance with the Inverted index; flat file usage; the Associative model; the Multidimensional model; the Multivalue model; the Semantic model; the XML database; the Named graph; and, Triplestore. Knowing stuff about stuff like this is where the killer skills differentiator comes into play.

I have been fortunate in that I can name some of the greatest data people of all times, as my own personal mentors, and I appreciate that for many, well everyone now, that this is not an option. However, there are ways and ways.

There is some great material out there about data modelling; unfortunately, there is an awful lot of crap as well. If you unsure how to differentiate, then ask an expert. There are a number of data experts commenting on the data related groups on LinkedIn.

In the old days it was quite easy to spot a data pro – slightly dishevelled look, tweed jacket, patches on the sleeves and a pipe, matches and tobacco in one of the pockets, Doctor Watson style, etc. but now in the virtual and aseptic worlds, it’s not so obvious who is who. What a pity those days have past, but such is life.

Lastly, consider this quote from Ove Arup. “Engineering is not a science. Science studies particular events to find general laws. Engineering design makes use of the laws to solve particular practical problems. In this it is more closely related to art or craft.”

NINE – Understand the database and data related technologies and products out there, and the pros and cons of using them. Also, strive to be technology agnostic.

This is probably the one aspect of the life of the Data Architect that most people will be familiar with… the tools and technologies. Probably for this reason alone there are recruiting agencies that cannot tell the difference between a technology product and the entire vast field of data architecture and management, or the differing importance of knowing the version of a piece of software and the knowing how to competently manage the Data Architecture of a global business.

Nonetheless, it’s good to have a grasp of the vast array of data related technologies and products out there, and to keep that knowledge as up to date as possible.

Therefore, this list is more for the aspiring Data Architect rather than for the experimented professional. Nevertheless, make sure you have a handle on these:

Please also note that there is a surfeit of data products in addition to those mentioned or referenced above.

TEN – Absolutely dominate the subject of Data Governance. Make Data Governance one of your master subjects, and be ready to bring it into play at a moment’s notice.

Take heed of the wise words of Sun Tzu: “If you know your enemies and know yourself, you will not be imperiled in a hundred battles… if you do not know your enemies nor yourself, you will be imperiled in every single battle.”

The DAMA Dictionary of Data Management defines Data Governance as “The exercise of authority, control and shared decision making (planning, monitoring and enforcement) over the management of data assets.” DAMA has identified 10 major functions of Data Management in the DAMA-DMBOK (Data Management Body of Knowledge). Data Governance is identified as the core component of Data Management, tying together the other 9 disciplines, such as Data Architecture Management, Data Quality Management, Reference & Master Data Management, etc., as shown in the following diagram:

Whilst we are at it, I would encourage everyone interested with a professional interest in Data Architecture to check out ‘Data Architecture: A Primer for the Data Scientist’. This is a bit of blurb from the Amazon site:
“Today, the world is trying to create and educate data scientists because of the phenomenon of Big Data. And everyone is looking deeply into this technology. But no one is looking at the larger architectural picture of how Big Data needs to fit within the existing systems (data warehousing systems). Taking a look at the larger picture into which Big Data fits gives the data scientist the necessary context for how pieces of the puzzle should fit together. Most references on Big Data look at only one tiny part of a much larger whole. Until data gathered can be put into an existing framework or architecture it can’t be used to its full potential. Data Architecture a Primer for the Data Scientist addresses the larger architectural picture of how Big Data fits with the existing information infrastructure, an essential topic for the data scientist.”

That’s all folks

Now, the clock on the wall is really telling me that I should wrap up this baby, warts and all, accidents, omission and typos included, and put it to bed.

This is far from being an exhaustive list of the things which a Data Architect should cultivate, hone and excel in. And yes, I know I ‘missed a bit, there’ as well. And yes, I know I started a new sentence with an ‘And’, and, and, and. And yes I… But, anyways… hey ho! upwards and onwards!

Nonetheless, I hope this little piece was informative or entertaining, or even both. At some level of abstraction or another.

If you spot any glaring errors in this piece then please let me know in the comments section below and I will revise as necessary. Thanks in advance for that.

I will leave you with the words of one of my favourite contemporary architects, Zaha Hadid**:

“I started out trying to create buildings that would sparkle like isolated jewels; now I want them to connect, to form a new kind of landscape, to flow together with contemporary cities and the lives of their peoples”

Many thanks for reading.

In subsequent blog pieces I will be sharing my views on the evolution of information management in general, and the incorporation novel and innovative techniques, technologies and methods into well architected mainstream information supply frameworks, for primarily strategic and tactical objectives.

As always, please reach out and share your questions, views and criticisms on this piece using the comment box below. I frequently write about strategy, organisational, leadership and information technology topics, trends and tendencies. You are more than welcome to keep up with my posts by clicking the ‘Follow’ link and perhaps you will even consider sending me a LinkedIn invite if you feel our data interests coincide. Also feel free to connect via Twitter, Facebook and the Cambriano Energy website.

A note from the editor:

Readers should be well aware that the comedian who wrote this piece is the self-styled founder of The Big Data Contrarians, which is quite possibly the most belligerently intelligent Big Data group you will ever come across in your entire life. You have been warned. If you need to verify these facts for yourself, then take a look here at your own risk:https://www.linkedin.com/grp/home?gid=8338976

And now for something completely different…

You may have noticed the massive relative-growth in the number of people who are describing themselves as Big Data gurus, data science Kaisers or analytics evangelists. Okay, I exaggerate to evidence the trend, but you’ll hopefully get the gist. Your fellow comrade on the picket line, that sweetie you met at the Pitt Club — even your darling masseuse has had their carte de visite transformed according to the prevailing désir de Jour.

Many people out there in the big data world suddenly call themselves ‘Big Data gurus’ simply because it is the latest vogue. The Caerfyrddin Good Pub Guide even went so far as to say that, “adding Big Data to your job title was the equivalent of sexing up a dodgy dossier”. They also later suggested that boosting your resume with the judicious incorporation of a titillating title, such as Big Data analytic pole-dancer, may get you a few chuckles, even if most people don’t understand and appreciate its broad multi-faceted and humoristic ramifications.

That stated, the cruel and harsh reality is that many who call themselves Big Data gurus appear to be lacking the full Big Data picnic by quite a few sandwiches when it comes down to looking at the nitty gritty of their visible resumes. Moreover, and to be Frank and Earnest – Frank in Zurich and Earnest in Pontypridd – if I was hiring a top notch Big Data guru I wouldn’t even know where to start.

What I see are quite a number of courageous fellows who don’t really have a Scooby[1] (that’s a ‘clue’ for readers overseas) about Big Data, whose are enriched and swelled by others of their mind, who know just enough to be dangerous.

What I also see are Big Data hacks who cannot bring themselves to articulate one coherent, cohesive and verifiable Big Data success story. They are calling themselves Big Data gurus, presumably because of the incredible value accruable from their unrevealed bullion-class information.

The best offenders amongst the Big Data gurus being incredibly precocious with the hype and amazingly prudish with the facts.

Why, only the other day, one of these Big Data hype chappies – whose name escapes me for the moment – was lamenting the dearth of true data scientists whilst simultaneously lambasting and misrepresenting the profession of the contemporary statistician.

Now, I I am not fundamentally averse to a bit of hype, so long as it’s in moderation, and it doesn’t frighten the horses. Take my Dad, for example, he never curses, and he never has. Me, I use it for dramatic effect and emphasis – occasionally. However, a lot of the Big Data ‘hackery’ that we are entertained with is like having Big Data Derek and Clive playing in your ear, 24x7x52. It’s just too much, and most of the time it should be toned down or turned off. You know “So, this bloke comes up to me and says ‘Hello!’ ”

Then there are the people from the consultancies, the IT vendors and the service providers (alright, not all, just a few) who have a good grasp of the superficialities, the business, analytics and Big Data terminology, even if they have at best a tenuous grasp of the underlying structures, concepts and relationships that these terms relate to. Of course, it’s all hooked on context.

It’s not important if it’s just about a bit of a chit-chat down ‘The Black’ or a bit of banter over Kaffee und Kuchen, or a passing comment from the umpire at the crease. As Afilonius Basto put it to me “in a seriously professional business setting, many of the predominant Big Data gurus who rock the Big Data bull**** babble, just lack a certain creativity, knowledge and experience in order to act as truly reliable informers and trusted Big Data advisers.” So I ask you, who am I to argue such matters with a highly intelligent Gos D’atura?

Part of the problem here is, to borrow the unwritten style guide of Big Data ‘hackdom’, simple supply and demand voodoo and superfluous use of important and amorphous terms tagged on to such examples. To wit, the incongruous use of terms such as ‘economics’, ‘performance’ and ‘science’. To say nothing of the irritatingly banal use of ‘amazing’, ‘amazing’ and ‘amazing’. In addition, one thing that has been puzzling me is this. Why do some Big Data pundits have to overegg their verbosely literal output with US slang that even professional bloggers in the USA would tend to use very sparingly? I know, I know what you’re thinking. “Gerroutta heres”, right? But for me it is like the IT equivalent of the mock cockney accents that are used by some celebrity chefs. I kid you not! It’s awful and it’s going viral, just like the bubonic plague and Greensleeves.

Now back to the main story thread. There simply are not enough true Big Data gurus out there to fill the demand, and so barely qualified (or ‘make it up as you go along’) aspirants make it into the higher stratified stratosphere of the Big Data saloon.

Second, just like many Big Data success stories themselves, the role of a Big Data guru is often poorly demarcated within the ambit of association and relation and even indeed within a solitary business. People bandy around terms such as Big Data guru, Big Data whizz and Big Data who’s your daddy, willy-nilly postman style, These are terms that can mean everything and anything. From “he’s hot on Big Data that chap is” to “there goes thick Jack the Spratt the densest Big Data guru in Christendom”, and upwards and onwards to “did you see Marty, the Big Data party? Got Carmen Miranda from Data Analytics a leave of absence, although rumour has it that the she is down with predictive impregnation after reading a particularly sordid Big Data hype-piece on LinkedIn”.

A true Big Data guru/expert is so much more, so much more than what we have now. In my opinion, a Big Data guru/expert is about:

Facts. The Big Data guru should stick to the facts. For example, if a Big Data guru does not understand the roles and responsibilities of a statistician then they should keep mum, and be admired for their discretion, rather than opening up a floodgates of mental garbage and thereby inviting questions regarding credibility, and introducing the risk of being viewed as a buffoon. Exempli gratia, if you do not know what a statistician does, then ask, rather than simply making things up. Also ensure that you never get into a position of belief and thought that you did not rationalise yourself into. Because you will be stuck there and no one will be able to reason you out of that belief, because it’s a position that is not in itself based on reason, truth and logic.

As Malcolm X put it “Despite my firm convictions, I have always been a man who tries to face facts, and to accept the reality of life as new experience and new knowledge unfolds. I have always kept an open mind, a flexibility that must go hand in hand with every form of the intelligent search for truth.”

So, aspiring Big Data boys and girls? Please stick to the facts! Your Big Data god, friends or family will thank you for it.

Integrity. What Big Data definitely does not need is more hype-schlepping hypocrites, bamboozling babblers and conniving charlatans. What is needed are people who exude virtuous truthfulness, candour and pedagogical ethics.

According to Integrity Action, integrity is “the set of characteristics that justify trustworthiness and generate trust among stakeholders. Integrity creates the conditions for organisations to intelligently resist corruption and to be more trusted and efficient.” More broadly, Wikipedia puts it this way: ” Integrity is the quality of being honest and having strong moral principles; moral uprightness. It is generally a personal choice to uphold oneself to consistently moral and ethical standards.”

Trust. A Big Data guru/expert should be trustworthy, and seen to be trusty. Just like Caesar’s wife or the quality of brunch bar at Tiffany’s, the Big Data guru must be beyond reproach. Not that they aren’t allowed a little journalistic license, simply that the gaping abyss that separates barefaced porkies[2] from simple embellishments, is frankly enormous.

Knowledge. A Big Data guru/expert should be knowledgeable in all things data, and not just Big Data. Knowledge means you know what Data Warehousing is and don’t fib about it or grossly misrepresent it in order to score ill-gotten brownie points for Big Data babble. Knowledge means you know, not that you have a bit of an idea, know a friend of friend of a friend, or can ask the audience, in cases where one is caught-out, ill-advisedly pretending to know something one doesn’t know. My advice is this. Temper knowledge with humility, honesty and decency, and you won’t go far wrong.

Experience. A Big Data guru/expert must have walked the talk. Knowledge must go hand in hand with experience. Clearly a few self-labelled Big Data gurus doing the rounds these days do not fit the bill in this respect (or for that matter in many of the other ‘respects’). Alas, not all is lost. You too can acquire both the data knowledge and data experience to become a Big Data guru. How? Try working at it for a while.

There is another way at looking at experience, in a half-comic and half-serious way. To paraphrase a popular joke “Do not argue with an idiotic Big Data guru. He will drag you down to his level and beat you with experience.” And it happens…

Vocation. No, this is no time for a holiday. The Big Data guru must assume the mantle of Big Data stardom as a vocation and not just as an early-adopting fashion follower. As Voltaire put it, speaking of Newton but also commenting more broadly on education and the Enlightenment: “I have seen a professor of mathematics only because he was great in his vocation, buried like a king who had done well by his subjects.”

Simplicity. A Big Data guru/expert must be able to explain complex Big Data ideas in a simple ways, but without losing the essence or the credibility of what requires conveying. Tesco had a slogan, “you cannot bullshit simplicity”, which tried to convey this essence, and a German retailer took this even further with “Every Lidl helps”. So remember, keep it simple. “Sophistication is the ultimate sophistication” – Leonardo da Vinci.

Journalism. Although not necessarily a master of joined up handwriting and maths 101, a Big Data guru must also write like a journalist. He or she does not have to be a great scribe, simply being competent with words, concepts and numbers is a high enough bar. Okay, there is a little more to it than that – or at least, there should be. So I will explain.

These are a few things (also mentioned elsewhere in this piece, and influenced by the World Journalism Institute), that a good Big-Data-guru journalist should possess or be aware of:

It’s mainly about people. What we write about will influence and affect people, so we should remember that, and act accordingly, with empathy and with compassion.

Don’t ever put-up and shut-up when presented with bullshit.

Be sceptical and be prepared to verify.

Have great and reliable sources.

Continually check your biases.

Be adaptable and welcome change.

Don’t be intimidated.

Be tenacious.

Be open minded.

Always maintain one’s own integrity.

A good story. According to the Writers Store there are seven elements that make a god story: ” the change of fortune, the problem of the story, the complications, crisis, climax and resolution of the classical structure, and the threat, which is by far the most important.” A truly great Big Data guru will be able to take this advice and apply it to the field of Big Data with little difficulty. It would make a great change from wading through Data Lakes of ‘bleh’ in search of the Holy Big Data grail. So, having set the scene, let’s take a closer look.

A real Big Data guru must be able to tell a compelling and credible story. However, they must also show as well as tell. Telling alone is fiction, which is fine, but a Big Data needs to back great fiction up with fact. A Big Data guru worth their salt must be able to tell and show. Nothing less than a verifiable story is acceptable. Don’t be promiscuous with the facts in a story, especially when one heralds a Big Data success, without providing any hard evidence to back it up. Remember this, just because some people are incredibly gullible when it comes to Big Data, it doesn’t mean you should lead them in ignorance down the garden path, like so many innocent lambs to the slaughter. That sort of thing is quite despicable even by our regular standards. So don’t go out of your way to prove Dan Ariely right yet again, especially with regards to his very accurate comment that “Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”

Technology. A Big Data guru should know the technology. They should also know the origins of the technology and its influences. For example, a Big Data guru should have a sound grasp of the principles of the following:

Distributed file stores and the many varieties that exist and have existed. Examples of these are Lustre, GPFS and HDFS.

Database models, architectures and technologies. From 1960 to today. Flat files and hierarchical, network, relational, object, and, dimensional models. There are more, but this is the place to start.

This is far from being an exhaustive list, but it should give you a flavour for what is required.

Also, take heed of the words of Pablo Picasso, who stated, “Computers are useless. They can only give you answers.”

Agnostic. An ethical and professional Big Data pundit that really rocks the Kasbah, must also be as agnostic as it is possible to be, yet without letting an insistence on ‘fair’ and ‘balance’ upset the civic order of things. This simply means that you do not have to give equal merit to all and sundry, especially when equal merit is palpably not inherently the case in the broad range of technical, project and business offerings that wash over the decks of the SS Big Data. At the end of the day, in all cases the postmodern interpretation of fair and balanced is also a massive contradiction in terms.

The Swiss philosopher, poet and critic Henri Frédéric Amiel once wrote that “A belief is not true because it is useful”, and this is we should take special care in taking an agnostic stance with regards to Big Data and its technologies. There is a tendency for Big Data gurus to big up Hadoop and ignore the rest of the field. Not only is this narrow-minded view an injustice, but it is ultimately detrimental to those who seek to understand and obtain benefit from deploying Big Data solutions.

Versatile. I have seen many people try to accommodate the meagre competence of some self-anointed Big Data gurus within a far too wide an area of acceptance. This is not an issue if people are aware of the rampant bias, babble and boloney factors, but nonetheless extreme caution needs to be exercised if awkward unintended consequences are to be avoided. Remember a good Big Data guru can come from a variety of backgrounds — and not all of them will necessarily require a degree from a prestigious centre of educational excellence such as Oxford, the LSE or Prifysgol Cymru, Y Drindod Dewi Sant. In fact, one could say that attendance at St Trinian’s School for Young Ladies, Hogwarts School of Witchcraft and Wizardry, or Cambridge Finishing School could quite possibly be a negative factor, although I would not associate a certainty factor with any of these bets. Although to be fair, I am reliably informed by the Headmistress that Saint Trinian has an excellent Data Science faculty.

Business understanding. The ideal Big Data guru must be business aware, savvy to the point of shrewdness, and cunning, preferably “as cunning as a fox who’s just been appointed Professor of Cunning at Oxford University”[3]. They must understand business process, the commonalities and differences of business sectors and players, and the motivations, competitive forces and key influencers in business. They must also understand the meaning of business-strategy, how it is developed, chosen and executed. Finally, the Big Data guru must have a good handle on irrationality and its degrees of predictability. So, know your business, and understand the business of others. So when someone gives you a bucket and tells you to go down to Tesco’s to buy a petabyte of data and a Euro Millions Lottery ticket, you’ll know what it’s all about.

Analytical. A good Big Data guru must be naturally analytical, but not to the point of being anally analytical, and should possess an ability to spot patterns in behaviour as well as in data. CIA veteran Dick Heuer put it this way: “Thinking analytically is a skill like carpentry or driving a car. It can be taught, it can be learned, and it can improve with practice. But like many other skills, such as riding a bike, it is not learned by sitting in a classroom and being told how to do it. Analysts learn by doing.”

That’s all folks

If you encounter a candidate Big Data guru with all of these traits — or have a candidate who ticks most of the boxes but is willing to acquire more ticks — then you’ve found someone who might deliver unbelievable value to the cause of Big Data, your struggle, your reason, your content, and your living-room. So, delight in your find.

However, be sparing with candidates on any of these inherent individualities, and you run the risk of acquiring a graded and coarse-grained pretender, someone just hoping to travel the Big Data bullshit babbling bubble until it bursts in their brazen time-pieces[4].

I know, I know, I hear you say “but does this all really matter?” Probably not. Probably in the grand scheme of things this is yet another boom and bust fad destined to become matter of fact in some reduced circles and a plain waste of time, money and patience, in others. No doubt we shall see, as the story unfolds and the ‘sublimely absurd’ metamorphoses into the ‘I don’t bloody believe it’, or not.

So, now over to you. What would you add to this compendium of convenient characteristics? I would really love to receive your views, opinions and perspectives in the comments section that follows on from this piece.

Many thanks for reading.

A note from the Prime Minister:

Data is only as good as its time and place utility. If it has none, it has no present value, unless of course, someone wants to pay something for nothing, but that is constructing a con not an economy, an aberration destined to be hated and then forgotten. Don’t only think about how to use the data you have, but also about what data should be captured and how it should be used. By the way, join the Big Data contrarians here on LinkedIn: https://www.linkedin.com/grp/home?gid=8338976

As a child, I adored the USA rock band the Eagles, especially the musical talents of Joe Walsh. This explains the inspiration behind the title of this piece.

So, what’s going down at Ashley Madison?

Never heard of them? Off your radar? Surely not?

That stretches the bounds of incredulity. As even the people in Singapore’s Media Development Authority have heard of them. They even described their business site this way “it promotes adultery and disregards family values”, and subsequently will not allow them to operate in Singapore. Well, what a turn-up for the books.

On a more serious note, and as you might know, (from Wikipedia or some other ‘sites’,) Ashley Madison is a Canadian-based online dating service and social networking service marketed to people who are married or in a committed relationship. Its slogan is “Life is short. Have an affair.” It seems, if we are to believe various reports doing the rounds, that their Big Data has been compromised, big time.

Yes, I know, how could that possibly have happened, right?

According to some reports, Adison Mashley have around 37 million clients in the Big Data pool, and large caches of it have allegedly been stolen after an apparently successful hacking attempt was carried out. According to Krebs On Security, data stolen from the web site in question “have been posted online by an individual or group that claims to have completely compromised the company’s user databases, financial records and other proprietary information.”

But, again I ask, how can this happen?

I am not an avid fan of Big Data technology for core business use, and given the level of Big Data technology maturity, it sounds like a dopey idea. But each to their own.

What I will state is that my database management experience has tended to be associated with database technologies that can only be hacked as part of an inside job i.e. where people either know user IDs, passwords, IP addresses and layers of protection etc. or know of someone who does. Either someone who is a friend, part of the family (no, not that type of ‘family’) or someone who can be blackmailed into divulging the required access paths and security check workarounds.

However, taking a broader and more permissive view of this alleged hackerisation of Big Data, do we write it up as a Big Data success, i.e. The Amazing Big Data Affair? Put it down to a technical glitch and community faux pas? Or do we take a jaundiced view of the whole thing and keep it real? I await with baited breath for the enlightened opinions of the Big Data gurus.

Mitch ‘n’ Andy are not unfamiliar with ‘issues’ related to the use of people’s data. The Daily Dot carried a piece from contributing writer S. E. Smith with the headline ‘Why Ashley Madison is cheating on its users with Big Data’ in that piece, Smith states that “Like pretty much every other website on Earth, Ashley Madison spies on its users and crunches the data in a variety of ways to increase the bottom line.”

Belinda Luscombe writing in Time confirmed these suspicions with a piece titled ‘Cheaters’ Dating Site Ashley Madison Spied on Its Users’. She wrote:

In a study to be presented at the 109th Annual Meeting of the American Sociological Association in San Francisco on Saturday Aug. 16, Eric Anderson, a professor at the University of Winchester in England claims that women who seek extra-marital affairs usually still love their husbands and are cheating instead of divorcing, because they need more passion. “It is very clear that our model of having sex and love with just one other person for life has failed— and it has failed massively,” says Anderson.

“How does he know this? Because he spied on the conversations women were having on Ashley Madison, a website created for the purpose of having an affair. Professor Anderson, who as it turns out is a the “chief science officer” at Ashley Madison, looked at more than 4,000 conversations that 100 women were having with potential paramours. “I monitored their conversation with men on the website, without their knowing that I was monitoring and analyzing their conversations,” he says. “The men did not know either.”

Elsewhere, and as reported on Wikipedia, “Trish McDermott, a consultant who helped found Match.com, accused Ashley Madison of being a “business built on the back of broken hearts, ruined marriages, and damaged families.”

Wow, wow, and triple wow! What a way to run a dance hall!

Maybe they should reconsider their slogan, making it more snappy and apposite. How about “Life is short, we pimp your Big Data” as a starter? So go ahead, make your own and post it below. Have fun.

When it comes to Big Data, some people accuse me of being akin to a Luddite. Nothing could be further from the truth. Not that the facts matter. In the age of superficiality and surfaces there is as much wilfully cultivated obliviousness as there is unashamed and unabashed term abuse. Add the prevailing underlying current of anti-intellectualism into the mix, and we have an explosive combination that manifests itself in the alliterative combination of bluff, bluster and banality.

I was reticent about writing this article, because it’s a bit like arguing against the irrational, self-interested and wilfully obtuse. Or as Ben Goldacre would have it, “You cannot reason people out of a position that they did not reason themselves into.” Therefore, a lot of care needed to be exercised. Indeed, Mark Twain once stated, “Never argue with stupid people, they will drag you down to their level and then beat you with experience.” Now, I wouldn’t go that far, and I do try to be nicely diplomatic, most of the time, but I can see where he was coming from.

Anyway, without more ado let’s get a handle on what a Luddite is, in terms I hope that most will understand.

According to Wikipedia (yes, I know) The Luddites were:

“19th-century English textile workers who protested against newly developed labour-economizing technologies from 1811 to 1816. The stocking frames, spinning frames and power looms introduced during the Industrial Revolution threatened to replace the artisans with less-skilled, low-wage labourers, leaving them without work.”

So why do I get a feeling that some people think that I am a Big Data Luddite?

“With all due respect – your post does sound a little like what I could envisage an exchange between a man riding a horse and a man driving one if the first automobiles….sorry.”

Although a respectable knowledge of the technology and its evolution would inform otherwise, I assume that this means that I would be the “man riding a horse”… An interesting piece of conjecture indeed, even if hat in lacks in accuracy is made up for by the inexplicable certainty of belief. Still, it’s fascinating to discover just how many ‘experts’ think that this stuff – the sort of stuff I was doing in the mid to late eighties at Sperry and later Unisys – is bleeding edge innovation,

“Yes, cynical indeed… here is another amazing Big Data success story. You go on your computer, type in any search phrase and get instantaneous and highly relevant results. It is so amazing that a word has been coined. Guess what that is…”

What to say? There goes a person who seems to believe that the history of search starts and ends with the Google web search engine. Something slightly less than a munificently inapposite comment, only outdone by its tragically disconnected banality.

More recently, Bernice Blaar had this to say about my take on Big Data in general and The Big Data Contrarians in particular.

“Master Jones may well be the great and ethical strategy data architecture and management guru that the chattering-class Guardian-reading wine-sipping luvvies drool over, but he is also a brazen Big Data Luddite. No, actually far worse than a Luddite, he`s a Neddite, because with his ‘facts’ and ‘logic’ (what a laugh, you can prove anything with facts, can’t you [tou}???) he is undermining the very foundation of the Big Data work, shirk and skive ethic that has been so hardily fought for by the likes of self-sacrificing champions and evangelists of the Big Data revolution, to wit, such as those bold, proud and fine upstanding members Bernard Marr, Martin Fowler and Tom Davenport, for example, and the brave sycophants that worship at their feet. Martyn is worse than Bob Hoffman, Dave Trott, Jeremy Hardy, Mark Steel, Tab C Nesbitt and Bill Inmon, all rolled into one. He may be a great strategist, but I wouldn’t hire him. Contrarian Luddite!”

And then followed it up with this broadside:

“The Big Data Contrarians group are nothing more than a bunch of over-educated clown-shoes who are trying to scupper the hard-work of decent people out to earn a crust from leveraging the promise of a bright future. In a decent society of capital and consumers, they would be banned off the face of the internets.”

How does one reciprocate such flattering flatulence? How can one possible respond to such a long concatenation of meaningless clichés? Though to be fair, I quite liked being referred to as a Neddite, whatever that is.

Anyway, to set the record straight, this is where I stand.

A contrarian is a person who takes up a contrary position, especially a position that is opposed to that of the majority, regardless of how unpopular it may be.

Like others, I am a Big Data Contrarian, not because I am contrary to the effective use of large volumes, varieties and velocities of data, but because I am contrary to the vast quantities of hype, disinformation and biased mendaciousness surrounding aspects of Big Data and some of the attendant technologies and service providers that go with the terrain. I don’t mind people guilding the lily (to use an English aphorism for exaggeration), but I do draw the line at straight out deception., which could lead to unintended consequences, such as creating false expectations, diverting scarce resources to wasteful projects or doing people out of a livelihood. That’s just not tight.

Does that make me a Luddite (or a Neddite)? I don’t think so, but do make sure that your opinion is your own and is arrived at through reason, not some other persons bullying hype. As I wrote elsewhere some moments ago “If you have to lie like an ethically challenged weasel to sell Big Data then clearly there is something amiss.”

As always I would love to hear your opinions and comments on this subject and others, and also please feel free to reach out and connect, so we can keep the conversation going, here on LinkedIn or elsewhere (such as Twitter).

Okay, before we get started I have to declare the real intent for posting this piece. It is to get you to join The Big Data Contrarians professional group here on LinkedIn.

To apply to join the best Big Data community on the web simply navigate to this address http://www.linkedin.com/grp/home?gid=8338976 (or paste it into your browser) and request membership, the process is quick and painless and well worth the effort.

Now for the rest of the news…

There are many common misconceptions amongst the Big Data collective about Data Warehousing. There are common fallacies that need clearing up in order avoid unnecessary confusion, avoidable risks and the damaging perpetuation of disinformation.

Big Picture

In the dim and distant past of business IT, the best information that senior executives could expect from their computer systems were operational reports typically indicating what went right or wrong or somewhere in between. Applied statistical brilliance made up for what data processing lacked in processing power, up to a point, because even heavy lifting statistics requires computing horsepower, which in those days was really a question of serious capital expenditure, which not all companies were willing to commit to.

Then, and curiously coincidentally, people around the world started to posit the need for using data and information to address significant business challenges, to act as input into the processes of strategy formulation, choice and execution. Reports would no longer just be for the Financial Directors or the paper collectors, but would support serious business decision making.

Many initiatives sprang up to meet the top-level decision-making data requirements; they were invariably expensive attempts, with variable outcomes. Some approaches were quite successful, but far too many failed, until the advent of Data Warehousing.

Back then, most of the data that could potentially aid decision-making was in operational systems. Both an advantage and a problem. Data in operational systems was like having data in gaol. Getting data into operational systems was relatively easy, getting it out and moving it around was a nightmare. However, one of the advantages of operational data is that it was generally stored in a structured format, even if data quality was frequently of a dubious nature, and ideas such as subject orientation and integration were far from being widespread.

Of course, data also came in from external sources, but usually via operational databases as well. An example of such data is instrument pricing in financial services.

Therefore, briefly, a lot of Data Warehousing started as a means to provide data to support strategic decision-making. Data Warehousing ways not about counting cakes, widgets or people, which was the purview of operational reporting, or to measure sentiment, likes or mouse behaviour, but to assist senior executives, address the significant business challenges of the day.

Who’s your Daddy?

Bill Inmon, the father of Data Warehousing, defines it as being “a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.”

Subject Oriented: The data in the Data Warehouse is organised conceptually (the big canvas), logically (detailing the big picture and) and physically (detailing how it is implemented) by subjects of interest to the business, such as customer and product.

The thing to remember about subject areas is that they are not created ad-hoc by IT according to the sentiments of the time, e.g. during requirements gathering, but through a deeper understanding of the business, its processes and its pertinent business subject areas.

Integrated: All data entering the data warehouse is subject to normalisation and integration rules and constraints to ensure that the data stored is consistently and contextually unambiguous.

Time Variant: Time variance gives us the ability to view and contrast data from multiple viewpoints over time. It is an essential element in the organisation of data within the data warehouse and dependent data marts.

Non-Volatile: The data warehouse represents structured and consistent snapshots of business data over time. Once a data snapshot is established, it is rarely if ever modified.

Management Decision Making: This is the principal focus of Data Warehousing, although Data Warehouses have secondary uses, such as complementing operational reporting and analysis.

In plain language, if what your business has or is planning to have does not fully satisfy the Inmon criteria then it probably is not a Data Warehouse, but another form of data-store.

The thing to remember about informed management decision making is that it needs to be as good as required but it does not need to achieve technical perfection. This observation underlies the fact that Data Warehouse is a business process, and not an obsessive search for zero defects or the application of so called ‘leading edge’ technologies – faddish, appropriate or not.

Some Basic Terms

Before we delve into the meaning of Data Warehousing, there are a couple of terms that need to be understood first, so, by way of illustration:

Let’s follow the numbers in the simplification of the process.

We gather specific and well-bound data requirements from a specific business area. These are requirements by talking to business people and in understanding their requirements from a business as well as a data sourcing and data logistics perspective. Here we must remember at all times not to over-promise or to set expectations too high. Be modest.

These business requirements are typically captured in a dimensional data model and supporting documentation. Remember that all requirements are subject to revision at a later data, usually in a subsequent iteration of a requirements gathering to implementation cycle.

We identify the best source(s) for the required data and we record basic technical, management and quality details. We ensure that we can provide data to the quality required. Note that data quality does not mean perfection but data to the required quality tolerance levels.

Data Warehouse data models modified as required to accommodate any new data at the atomic level.

We define, document and produce the means (ETL) for getting data from the source and into the target Data Warehouse. Here we also pay especial attention to the four characteristics of Data Warehousing. ETL is an acronym for Extract (the data from source / staging), Transform (the data, making it subject oriented, integrated, and time-variant) and Load (the data into the Data Warehouse and Data Mart).

We define, document and produce the means for getting data from the Data Warehouse into the Data Mart. In short, a bit more ETL.

User acceptance testing. NB Users must ideally be involved in all parts of the end-to-end process that involves business requirements, participation and validation.

This is a very simplified view, but it serves to convey the fundamental chain of events. The most important aspect being that we start (1) and end (7) with the user, and we fully involve them in the non-technical aspects of the process.

Business, Enterprise and Technology

Essentially, a Data Warehouse is a business driven, enterprise centric and technology based solution for continual quality improvement in the sourcing, integration, packaging and delivery of data for strategic, tactical and operational modelling, reporting, visualisation and decision-making.

Business Driven

A data warehouse is business centric and nothing happens unless there is a business imperative for doing so. This means that there is no second-guessing the data requirements of the business users, and every piece of data in the data warehouse should be traceable to a tangible business requirement. This tangible business requirement is usually a departmental or process specific dimensional data model produced together in requirements workshops with the business. We build the Data Warehouse over time in iterative steps, based on the criteria that the requirements should be small enough to be delivered in a short timeframe and large enough to be significant.

Typically, a Data Warehouse iteration results in a new Data Mart or the revision of an existing Data Mart.

Enterprise Centric

As we build up the collection of Data Marts, we are also building up the central logical store of data known as the Enterprise Data Warehouse that serves as a structured, coherent and cohesive central clearing area for data that supports enterprise decision making. Therefore, whilst we are addressing specific departmental and process requirements through Data Marts we are also building up an overall view of the enterprise data.

Technology Based

By technology, I mean technology in the broadest sense of techniques, methods, processes and tools, and not just a question of products, brands or badges.

Unfortunately, there is a popular misconception that Data Warehousing is primarily about competing popular and commercial available technology products. It isn’t, but they do play an important role.

Architecture

The following is an example of a very high-level Data Warehouse architecture diagram.

Methodologies

Various methodologies support the building, expansion and maintenance of a Data Warehouse. Here is one example of a professional data integration methodology, produced, maintained and used by Cambriano Energy.

And here is an information value-chain map as used by Cambriano Energy as part of its Iter8 process management. There are alternatives, many of which do a satisfactory job.

Last but not least, this was (from memory) the way that Bill Inmon’s Prism Solutions ETL company used to view the iterative EDW building process.

Keeping it Shortish

At this point, I decided to cut short further explanations on aspects on Data Warehousing. However, if you have any question then please address them to me and I will do my best (or something close) to answer them.

That’s all folks

Hold this thought for another time: If you think you can replace a Data Warehouse, that is not a Data Warehouse, with another approach to ‘Data Warehousing’ that doesn’t produce a Data Warehouse, for as fast and cheap as one can do it, then you still don’t have a Data Warehouse to show for all of your efforts. That is not a great place to be.

Therefore, you see, Data Warehousing was never about a haphazard approach to providing random structured, semi-structured and unstructured data of various qualities, provenance, volumes, varieties and velocities, to whomever was of a mind to want it.

Many thanks for reading.

If you want to connect then please send a request. I you have any questions or comments then fire them off below. Cheers 🙂

If you know all about Sentiment Analysis, you’ve come to the right place. Because I don’t have a clue if what I know about it is accurate or not.

I started to do a bit research into this Sentiment Analysis lark, in particular with the theoretical idea of using it to analyse and draw conclusions from comments on Pulse – assuming that this is what it can be used for.

To begin at the beginning, which is good place to start, I read the piece on Wikipedia, and this was how it began:

“Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials.

Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).” Source: Wikipedia Link:http://en.wikipedia.org/wiki/Sentiment_analysis

Well, that’s a fairly intuitive description. I could have almost have guessed as much.

But, back to the aim of analysing sentiment in Pulse comments, where to start and what to do.

What would sentiment analysis make of these:

On the death of an IT-business celebrity. What would sentiment analysis make of the very emotive comments of desolation, sadness and poignancy of people who didn’t personally know the departed, even remotely, or maybe didn’t even know of them until after they had ‘shuffled off life’s mortal coil’? How would that work? What would sentiment analysis make of the maudlin aphorisms, surrogate grief and bizarre sorrow of people separated by more degrees than Kofi Anan and Mork from Ork. What additional insight does sentiment analysis tell us when these comments are analysed along with the body of the text and other comments that triggers these comments?

In a similar vein, how does sentiment analysis catch instances of sycophancy? Especially considering the fact that some of it is so ‘in your face’ and blatant that it often times seems to be a bad parody of a bad parody. “Oh, Ricky, why are you such a sexy brainbox?” How does it work in those situations?

Worse than that is the preening, gushing and obtuse texts of massive, errm… fabulators[i]. If it wasn’t about Big Data or Strategy or IT, it would be about something else, usually about the writer themselves. “I give Rafa and Rodge tips on tennis! I went to the University of the Universe and got a first! I challenged Superman to a race, and won! I have read the entire works of Dan Brown, 25 times…Neeeh!” What would sentiment analysis do with that sort of gold?

Also, what does sentiment analysis do with texts so ambiguously daft that they could mean anything? Okay, it might be able to pick up a few trigger words here or there, “rubbish”, “of”, “load”, “a”, “what”, etc. However, how does it know when “excellent” is being used in a way that means anything but excellent? For example, “Excellent Big Data job there”, with the silent “if you want a job doing properly then do it yourself”.

Finally, for the purpose of this little piece, what would sentiment analysis do with term abuse, if it could actually identify it? Going back to the use of the terms such as Big Data or Strategy, how can sentiment analysis discern between the dopey and wrong-headed use of the term, and when it is actually being used in a coherent, cohesive and consistent way, in line more or less with its formal definition? I suppose we can always write a mountain of rules to help us out:

If topic in focus of piece is strategy

And context of topic is business

And author of piece is Richard Rumelt

Then the credibility of text is good (with a certainty of 100%)

But you and try and maintain a rule base with isntances like that. It soon becomes a management nightmare.

Alternatively, maybe it could be used to analyse this text. It’ll have its work cut out, that’s for sure. Does sentiment analysis do sarcasm and cynicsm?

Anyway! I bet you might know how this sentiment analysis works, don’t you? On the other hand, if not, then it will be someone else who ‘knows’. But of course, all will not be revealed, because it’s a secret so powerful, that in the wrong hands it could be used to dominate the entire galaxy.

Only joking; and many thanks for reading.

[i]To engage in the composition of fables or stories, especially those featuring a strong element of fantasy: “alandwhich…hadgiven itselfuptodreaming,tofabulating,totale-telling”(LawrenceDurrell).

Bernice and the Martians, BATM for short, were an incredibly popular progressive-rock band.

Their first big commercial success came with the release of their first album and their planned promotional tour, which took in all continents.

The manager of the band was none other than effable polymath, Renaissance man and good all-round rogue, Ricky Jonesy – an obsessive control freak, lover of fine wines and darling of predictive analytics. He really loved his numbers, his social media and his sentiment analysis.

In fact, much of the early success BATM came about due to Ricky’s unparalleled passion for the ‘Big Data’.

Ricky was the band’s architect. He had major input into their material: what they composed; how they composed; their stage sets and lighting; where they performed; the way they played; how they dressed; were photographed; spoke; walked; and, ate and drank. In short, he controlled the whole BATM enchilada. It was like being in data-driven heaven.

As I said, their first album, a progressive-rock masterpiece called ‘Your Hole’, achieved major critical acclaim even before it was bolting out of the stalls and across the interwebs. Overnight the band became big property, and their notional market value ran higher than Twitter on steroids.

The band members were really please. The presses interviewed Bernice right, left and centre and he made no bones about the fact that a major part of their success was due to Ricky and his Big Data mojo.

Articles about the phenomenon appeared in all the major social media sites. Facebook, LinkedIn and BubbaToons. Ricky was named Supreme Data Scientist of the year by the Gardener Group, hailed as a messiah by the Big Data Front and lauded by all and sundry.

Then the band went on tour. Blazing a trail of ones and zeros across the face of the planet.

They were 5 gigs into their tour and Ricky decided to call a band meeting.

“Hi, guys” said Ricky “I’ve been analysing the stats, and I see that those yokes Big Blokes in Tights are trending strongly on the social media, coinciding with the release of their new single Never Stick A Banger In Your Ear”.

“Oh, whoa” chimes in Bernice, “tell us what we gotta do then, Ricky”.

Back comes Ricky. “Well, this is what I thought we might do”

“We take the old Fester and Ailin song Tropical Diseases, we practice it as much as can, and then we play it at the next gig in Birmingham, this weekend”

“But, Ricky!” pipes up Marty Smarty, “it’s an Irish country and western song. It doesn’t fit in with what we do, does it? And, anyway, we only have three days to get it prepared.”

Ricky responds. “Ah, you don’t want to be worrying your little head over that. Trust me. Learn the song. It’ll be great. The public will love it.”

So, BATM learn the song. It’s perfect. At the Saturday gig, they play it as the encore. The fans love it to bits and there’s not a cold cigarette lighter in the place.

Then they fly off to Palma de Mallorca for a bit of a rest before their next gig in Madrid.

The guys and gals are lounging at the poolside at the legendary Don Pimpón Espinete Plaza complex. The weather is glorious, the food is glorious, the scenery is glorious, and even the orchestra is glorious.

Then along comes Ricky, calling yet another band meeting.

“Hi, guys” said Ricky “I’ve been analysing the stats again, and I see that those yokes Spanky’s Magic Piano are trending strongly on the social media with their cover version of Engel Humpadink’s The Monkey Song”

“We take the old Fester and Ailin song There’s A Dead Man Up The Chimney, and we rewrite it in the style of Tom Jones when he made that album of his, Little Fockers, was it? Then we practice it as much as can, until it’s perfect, and then we play it at the next gig in Madrid, this weekend”

“But, Ricky!” pipes up Brian McGarsical, “It’s a bit of an odd one isn’t it? I mean to say, it doesn’t fit in with what we do, does it? And, anyway, we only have four days to get it prepared.”

Ricky responds as fast as a chalked-up cat going down a drainpipe. “Ah, you don’t want to be worrying your little head over that. Trust me. Learn the song. It’ll be great. The public will love it. And anyways, it will fit nicely on the playlist, up there with Tropical Diseases.”

The band rewrite the song, and practice the Bedejaysus out of it. Ricky likes it so much that he gets the stats to confirm that this has to be number one on the next gig playlist.

Come the day of the gig, and BATM kick off, not with a progressive-rock anthem, but with There’s A Dead Man Up The Chimney. A group of young people at the front clearly are loving this new sound, but quite a few people are starting at the stage in fright, and it’s not from skunk induced paranoia either.

Two guys are having a conversation at the back of the hall.

“Yo, lunchbox, hurry this gig up, I thought this band was all progressive-rock and stuff, not this wiener schnitzel stuff.”

“No comment.”

Having divided the crowd with their first song, they play songs from their album. Again, they encore with Tropical Diseases. The crowd at the front go wild. The progressive rockers look on, bemused.

“Well, that was a mixed bag” says Bernice.

“Take it from your man Ricky. It all went fine lads. Just needs some fine-tuning of the songs and the analytics need to be a bit more real time. Take me word for it.”

Back comes a unison of “Okay, Ricky. We believe yas!”

So, off they go to Bonn, to prepare for the following weeks gig at the Live Music Hall in Cologne.

The band goes out visiting the museums, they have lunch at Brauhaus Bonnsch, and after a leisurely walk along the banks of Rhine they are taking a beer or three in a lovely little beer garden close to the United Nations campus.

Then out of the blue, a familiar voice can be heard.

“Hi, guys! We’re all goin’ on a summer ‘oliday”. It’s the voice of Ricky. “Anyway, Good news guys. I’ve been analysing the amazin’ Big Data stats again, and I see that those mensch Die Zahnarzt are trending strongly on the social media, especially on Swotter and Titter, with their amazon’ cover version of Podge and Rodge’s chillout mix of Currywurst and Microchips.”

Silence. No one says a word for the best part of infinity.

Ricky continues… “As you’re not going to ask, lads, I’ll tell you. We take the old song A Great Day for the Washing, and we rewrite it in the style of techno-Buddah-bar-chill. Then we practice it as much as can, until it’s perfect, and then we bang it out at the next gig in Cologne, this Friday. Innit. Come on lads, it’s 20 minutes of stage magic, and it’s a breeze.”

Come the day of the gig, and the band arrive early at the hall. Ricky is already there. He’s changed the stage set completely and has a new wardrobe for the lads – Bavarian romantic. They’ll soon be all Princed and Smiley Virused up to the eyeballs, wrecking ball included.

and BATM kick off, not with a progressive-rock anthem or chill, but again with There’s A Dead Man Up The Chimney. Again, a group of young people at the front clearly are loving this new sound, but quite a few people are starting at the stage in drug induced awe. Then they follow that up with A Great Day for Washing. By the time they get to the encore of There’s A Dead Man Up The Chimney, boisterous arguments are breaking out everywhere and empty crisp packets and used sticks of chalk are being thrown at the stage. It’s a disaster.

Four guys are having a conversation at the back of the hall.

“I liked the first song”

“No! The first was terrible. Minging! I want my prog rock back.”

“It’s like the choice of leprosy or the plague.”

“Down with this sort of thing.”

Next day Bernice calls an urgent meeting of the band.

Ricky kicks off.

“Well, lads bit of mid-week game yesterday wasn’t it?”

Bernice comes back with a “You can say that again, Rick”

“Don’t worry, I have analysed the social-media Big Data from all of the concerts, and we’re doing good guys. It’s in the analytics”

“We have to go back to our roots and drop all the changes we made”

A stranger in the lounge where they are having the meeting walks up to them and in simple language explains to them what has happened.

“You created a great product, a great brand, with some interesting progressive music”

“Your music was acclaimed and your world tour was eagerly anticipated by all your fans”

To begin at the beginning

Miss Piggy said, “Never eat more than you can lift”. That statement is no less true today, especially when it comes to Big Data.

The biggest disadvantage of Big Data is that there is so much of it, and one of the biggest problems with Big Data is that few people can agree on what it is. Overcoming the disadvantage of size is possible; overcoming the problem of understanding may take some time.

As I mentioned in my piece Taming Big Data, “the best application of Big Data is in systems and methods that will significantly reduce the data footprint.” In that piece I also outlined three conclusions:

Taming Big Data is a business, management and technical imperative.

The best approach to taming the data avalanche is to ensure there is no data avalanche – this approach is moving the problem upstream.

The use of smart ‘data governors’ will provide a practical way to control the flow of high volumes of data.

The Big Data Governor’s role is to help in the purposeful and meaningful reduction of the ever-expanding data footprint, especially as it relates to data volumes and velocity (see Gartner’s 3Vs).

The reduction techniques are exclusion, inclusion and exception.

It’s implementation is made through a development environment that can target hardware, firmware, middleware and software forms of hosting and continuously monitored execution.

In Short, it is a comprehensive approach to reducing the Big Data footprint whilst simultaneously maintaining data fidelity.

Here are some examples:

Integrated Circuit Wafer Testing

What’s this all about? Here’s an answer the good folk at Wikipedia cooked up earlier:

“Wafer testing is a step performed during semiconductor device fabrication. During this step, performed before a wafer is sent to die preparation, all individual integrated circuits that are present on the wafer are tested for functional defects by applying special test patterns to them. The wafer testing is performed by a piece of test equipment called a wafer prober. The process of wafer testing can be referred to in several ways: Wafer Final Test (WFT), Electronic Die Sort (EDS) and Circuit Probe (CP) are probably the most common.” (Link / Wikipedia)

Fig.1 – IC Fab Testing and the CE Data Governor

This exhibit shows where the Data Governor is placed in the Integration Circuit fabrication and testing/probing chain.

In large plants, the IC probing process generates very large volumes of data at high velocity rates.

Based on exception rules the Data Governor reduces the flow of data to the centralised data store.

It also speeds up velocity and time to analysis.

Greater speed and less volume mean that production showstoppers are spotted earlier, thereby potentially leading to significant savings in production and recuperation costs.

Let’s look at some of the technical details:

Taking our example of the IC Fab test/probe chain, a Data Governor should be able to handle a hierarchy or matrix of designation and exception.

For example, a top-level Data Governor actor could be the Production Run actor.

The Production Run actor could designate and assign exception rules to a Batch Analysis actor.

In turn, the Batch Analysis actor could designate and assign exception rules to a Wafer Instance Analysis actor.

The Internet of Things – IoT

Intrinsically linked to Big Data and Big Data Analytics, the Internet of Things (IoT) is described as follows:

“The Internet of Things (IoT) is the network of physical objects or “things” embedded with electronics, software, sensors and connectivity to enable it to achieve greater value and service by exchanging data with the manufacturer, operator and/or other connected devices. Each thing is uniquely identifiable through its embedded computing system but is able to interoperate within the existing Internet infrastructure.” (Link / Wikipedia)

Fig.2 – The Internet of Things and the CE Data Governor

This exhibit shows where the Data Governor is placed in the Internet of Things data flow.

The Data Governor is embedded into an IoT device, and functions as a data exception engine.

Based on exception rules and triggers the Data Governor reduces the flow of data to the centralised / regionalised data store.

It also speeds up velocity and time to analysis.

Greater speed and less volume means that important signals are spotted earlier, thereby quite possibly leading to more effective analysis and quicker time to action.

Net Activity

Much play is made of the possibility that we will all be extracting golden nuggets from web server logs sometime in the near future. I don’t want to get into the business value argument here, but would like to describe a way of getting Big Data to shed the excess web-server-log bloat.

Fig.3 – Web Server Activity Logging and the CE Data Governor

This exhibit shows where the Data Governor is placed in the capture and logging of interactive internet activity.

The Data Governor acts as a virtual device written to by standard and customised log writers, and functions as a data exception engine.

Based on exception rules and triggers the Data Governor reduces the flow of data from internet activity logging.

It also speeds up velocity and time to analysis.

Greater speed and significantly reduced data volumes may lead to more effective and focused analysis and quicker time to action.

Signal Data

Signal data can be a continuous stream of data originating from devices such as temperature and proximity sensors, by its nature, it can generate high-volumes of data and at high velocity – it can add lots of data, and very quickly.

Fig.4 – Signal Data and the CE Data Governor

This exhibit shows where the Data Governor is placed in the stream of continuous signal data.

The Data Governor acts as an in-line data-exception engine.

Based on exception rules and triggers the Data Governor reduces the flow of signal data.

It also speeds up velocity and time to analysis.

Greater speed and significantly reduced data volumes may lead to more effective and focused analysis and quicker time to action.

Machine Data

“Machine-generated data is information which was automatically created from a computer process, application, or other machine without the intervention of a human.” (Link / Wikipedia)

Fig.5 – Machine Data and the CE Data Governor

This exhibit shows where the Data Governor is placed in the stream of continuous machine generated data.

The Data Governor acts as an in-line data analysis and exception engine.

Exception data is stored locally and periodically transferred to an analysis centre.

Analysis of the totality of the same class and origins of data can be used to drive ANN* and statistical analysis which can be used to support (for example) the automatic and semi-automatic generation of preventive maintenance rules.

Greater speed and significantly reduced data volumes may lead to more effective and focused analysis and quicker time to proactivity.

Other Applications of the Data Governor

The options are not endless and the prizes are not rich beyond the dreams of avarice, but there are some exciting possibilities out there. Including applications in the trading; plant monitoring; sport; and, climate change ‘spaces’.

Fig.6 – Other Applications in the Big Data ‘space’ and the CE Data Governor

Summary

To wrap up, this is what the CE Data Governor approach looks like at a high level of abstraction:

Data is generated, captured, created or invented.

It is stored to a real device or virtual device.

The Data Governor (in all its configurations) acts as a data discrimination and data exception manager and ensures that significant data is passed on.

Significant data is used for ‘business purposes’ and to potentially refine the rules of the CE Data Governor.

To summarise the drivers:

We should only generate data that is required, that has value, and that has a business purpose – whether management oriented, business oriented or technical in nature.

We should filter Big Data, early and often.

We should store, transmit and analyse Big Data only when there is a real business imperative that prompts us to do so.