Data alright?

We may be wrong to focus on our data being right, when we should be more concerned that it is the right data. There is a congruence here with efficiency and effectiveness - doing the thing right and doing the right thing. We can put lots of effort into honing our processes and interactions so that they are efficient, but this can be to no avail if we are not doing the right thing. All very obvious you might think, but the reality is many people have conceptual problems with that abstract stuff we call data and will ask people like me to check if their data is right and miss out on the more important question of whether they hold the right data. I recently examined a database for an application that had been in use for some years. The quality of the data on most measures was excellent. The curious incident was what was not there, amongst which were email addresses and mobile phone numbers.

The data protection principles tell us that data should be right in both ways. The right data comes first in principle 3: "Personal data shall be adequate, relevant and not excessive...", or as my wife might specify it: "enough, but not too much". Principle 4 tells us the data should be right: "Personal data shall be accurate and, where necessary, kept up to date". In some ways we are struggling with terminology, as we don't have words that necessarily sum up our different dimensions of rightness with a clarity similar to efficient and effective, but I'm going to use pertinence to talk about the right data, and accuracy to talk about the data being right.

We can measure, quantify and report on both efficiency and data completeness and accuracy, and plot ways to fill any gaps in the existing record. The questions of effectiveness and whether we are recording the pertinent data are much more open and difficult to quantify. Whilst there may be some things that are definitely wrong, it is difficult to be certain that we ever have it as good as it could be, but we can be pretty certain that the world will move on and invalidate both our accuracy and pertinence. Focusing on pertinence more that accuracy, we move from the certainty of analysing the data quality to business analysis where we will need to take account of the dynamism of data and the business, and differing frames of reference for different users.

Within our companies, what is considered pertinent is going to vary by department. A contact record with just an email address may be considered a valid lead, and if you measure your sales or marketing effort on new leads then you will probably have lots of these. In the business world we may be able to extract some information from the domain, but the marketer tasked with deriving meaningful insight from this data has a lot to do, and would prefer all fields to be populated with accurate data. An operations person might be uninterested in the age of your contact, but would like the address and postal code to be correct. They may also be interested in any references stored to other systems, so they can link the data to the corresponding data elsewhere. An accounts person might be interested in something as basic as whether this is a new or existing customer, and the high level of duplicates in many systems shows this is not as simple a question as it sounds. Of course, you can try to enforce some mandatory fields, but the side effect of this can just be bad data as users enter something to get past the mandatory field that they see as stopping them doing their job. Mostly, bad data is worse than no data, as we may otherwise make assumptions based on that bad data.

Even in well controlled systems we may have problems where we try to categorise data. Too few categories and we lose precision in our insight. Too many and we similarly lose precision as the users will pick either the first one or an overall bucket. Enabling users to add their own categories may lead to chaos, as once you are beyond 30 or 40 categories users will add their own rather than try to find the closest match. I've seen many thousands of job functions in a system. Our categorisation, like our overall data, needs to be pertinent.

The world is dynamic, and our data will decay over time, both in accuracy and pertinence. General accuracy will probably decay faster than pertinence, but the way we reach customers and they find us keeps developing. You will need to review pertinence regularly. As with general accuracy, do not be afraid to throw away the old stuff. Not only will this help you see the wood for the trees, it is a data protection requirement. You may need to learn to tolerate some ambiguity, particularly as regards completeness. This can be difficult for those of us who have grown up with fully populated data, as well as making our query writing and reporting rather more complex, but the reality of the modern world is much more to start with minimal data and then update and enhance it when and where you can. This can be quite difficult, but at least if will be effective if you have focused on the right data before striving to get the data right.

John Davis 10 Jan 18.

That's another fine (data) mess you got me into.

As part of my recent audit work I have mused with colleagues as to how an organisation populated with intelligent, committed, conscientious and well-intentioned individuals can get itself into such a fine mess with its data and/or systems. Generally, things have improved as new methodologies have emerged and been honed. Nevertheless, there is still plenty of scope to go wrong, and I am going to contend that mostly this boils down to old-fashioned management issues rather than anything inherently technical. My particular worry is that managers are not taking responsibility.

I have been struck how IT departments are now viewed as a cost and have been pared down. Where development work is still done in house it may be done as sprints under an agile methodology. This is fine, and a good way to get the development done with involvement of all parties based on quick prototyping. The danger is that issues that come to light after a sprint may be left to fester. Particularly if they may seem small issues. Big bugs are easy to spot and should be caught in testing, and if not will still get fixed quickly when found. It is the little bugs or missing functionality that are more insidious. They may just work gradually corrupting your data. That incomplete lookup will cause wrong categorisation. The missing field will mean that another user field is doubled up with a comma; or was it a semicolon. Such problems may cause the users to need to run a little manual fix, or more likely a log of some sort on a shared spreadsheet. The resolution could be a small fix but doesn't fit into a sprint anywhere. IT have moved on. The close link between system and developer, possibly intermediated by a business analyst, has gone. Nobody is responsible.

The problems run deeper into our organisation populated with intelligent, committed, conscientious, well-intentioned and thoroughly nice individuals. Being well educated and experienced they are all managers. It will say so in their job title. They may manage external relationships or if they have any direct reports they are managing other managers. It is like an army with officers but no sergeants. Supervisors seemed to have been managed away. Nobody is supervising. With them has gone the quality control of data at a record level - the data that the rest of the organisation runs on.

Eventually the organisation will realise its systems/data are not fit for purpose and will investigate. Surprisingly often the perceived fix will be to build or buy a new system. There is an optimistic belief that somehow new systems will cure organisational/management problems. Even where the old system was a bespoke internal development. I could go into the reasons why this may seem a good idea to management, but that may send the cynicism way off the scale for one short blog.

But don't worry. These are not issues you need to fret about; it is not your responsibility.

John Davis 7 Aug 17.

Latest Analysis Secrets.

Courtesy of Rudyard Kipling

I keep six honest serving menThey taught me all I knowTheir names are WHAT and WHY and WHEN
and HOW and WHERE and WHO.

John Davis 27 Nov 14.

The importance of the old school tie in the internet age.

There was an excellent "Schumpter" article in the 18th October edition of the Economist on how many of the predictions of the impact of the internet have been wrong. Not just slightly wrong, but completely out of phase with reality. Schumpter looked at these three predictions:

1. "On the internet nobody knows you're a dog". Actually they do. They probably know your age, gender, tastes, where you shop and go on holiday, who you friends are, your sexual preferences and where you are now. Most of us leave a ready trail, happily post details of our life to public websites, and are content to swap data about ourselves for free in return for internet services.

2. Death of Distance. It seems that sales of online products spread by physical and then cultural proximity. Distance is still very important in the real world and the real shops seem very full of real people this time of year. Whilst some real shops now offer a "click and collect" service, some virtual ones, such as Screwfix, have opened real stores. With the ability to tell where potential customers are, even when mobile, distance is now, if anything, rather more important.

3. Disintermediation. Whilst some industries may have suffered disruption to their intermediaries it is easy to see many others where intermediaries have developed and flourished. So, while many insurance brokers may have been replaced by comparison websites, estate agents and recruitment businesses have not.

It would be possible to write books on the failure of each of these three predictions, but it is the lack of disintermediation that I find particularly interesting. Having talked to various recruitment agencies recently in my search for contract IT work I can say that they seem to perform two functions. One is a simple pattern-matching exercise to match candidate expertise to job specification. They have lots of CVs to process, so this seems to be a strict exercise with no room for thought outside the box. It could be done by a machine. The second stage is to call and validate what the filtered set of candidates have said they can do. I have done some technical interviews. Typically they don't last long, with the candidate soon apologising for putting some acronym on their CV when actually they have only had the most fleeting exposure to it.

It seems then that intermediaries continue to thrive, and new forms of intermediation will grow, for two reasons. Firstly, the virtual world is just too big. Many people would not know where to start searching, and even if they did they may not have the time and knowledge to sift through all the data that would be returned. Secondly, that world is full of companies and people who might not be all that they first seem. Who can you trust? Most people would know that online reviews are open to abuse, and would treat them as only a rough guide. They may not have realised until recently that those price comparison sites will act as a broker, taking commission and quite possibly not showing them the best deal if more commission is available on other deals. Who guards the guards?

So how do people cope? They cope by relying on brands. They cope by relying on personal networks: the old school tie, colleagues and local contacts (we're back to the importance of proximity again). There has been disruption, but not the huge disruption that was forecast. Instead, the internet has increased barriers to entry and reinforced old behaviours. I forecasted the importance of brands 15 years ago, but didn't realise the importance of contacts, otherwise I might have gone to more old school and college dinners and joined linkedIn earlier.

John Davis 13 Nov 14.

Data Issues going around in Circles

Whizzing round the Olympic velodrome last weekend with my family was great fun, if not slightly terrifying as you go up the banking. Not surprisingly data was the last thing on my mind! But at the end we received certificates which were all misnamed. This is pertinent for a couple of reasons. Firstly, it shows there is a manual interface in place. Somebody had copied our names from a screen (our online booking), or printout onto a piece of paper and passed that paper to someone else to create the certificates on another system. That interface was obviously seriously error prone. Secondly my name is John Davis. It is quite common (I blame my parents), and most clerical workers would know that it could be John or Jon, and Davis might be Davies. My son, Guy Davis, became Gille Davies.

So the great experience was tainted by a certificate that was wrong for every single member of the family. No one had thought through the consequences of the data quality issues and had taken a bit more care. We won't be framing it and putting it on our wall, or tweeting a picture of the certificate or sharing on Facebook - losing the Velodrome valuable free marketing. These issues with missing interfaces and poor data quality are the same ones that I came across when I started working in computing and business analysis in the 1980s. Things may have improved for a while, but now that companies have some applications in the cloud we have returned to a situation of many silos, and often some imperfect interfaces between them.

A focus on data quality is an issue that will run on. This is partly an issue with supervision and the reinforcement of the importance of getting data correct before you can rely on it to drive your business (It needs that great MBA stalwart, "Senior Management Commitment"). This will get no easier as the workforce is infiltrated by a younger generation who are accustomed to txting and spreading their focus across a number of apps on a number of devices, with a commensurate loss of attention to detail (am I sounding old now?). Nevertheless we can do better, taking out scribbled manual interfaces by calling APIs, using our systems to do more validation and checking at the point of original data capture, and reconciling data between silos and against other sources. Not only is this more efficient, companies get happy customers as it shows they care about the little things - our names, and who we are!

If you need help with your processes, data, or interfaces, then give me a call. I'm the red and black blur.