Data Quality, B2B, Supply Chain, Enterprise Architecture and Donkeys!

Post navigation

I know the title sets up for a long and potentially arduous and probably cumbersome discussion, but I’m not planning on tackling everything here…just some food for thought and discussion.

I don’t know about you, but I’m really excited about the idea of 3D printing. While I can’t expect to say “Tea, Earl Grey, hot” and expect such a cup to be ready for me quickly, I like the idea of being able to print something useful when needed. If nothing else it seems like a great toy – though I fully understand it is far beyond that stage already. Reading this Wired article about Solidoodle makes me understand how far it has come – as $500 personal 3D printers are here! My first two thoughts were 1) what is this going to do to global supply chains for basic, reusable products and 2) what are the Intellectual Property (IP), patent and trademark issues when someone takes an image off the web and prints it?

Let’s start with the second question. We have seen vigorous defenses of copyrights for entertainment titles – music and movies for instance – against groups that would otherwise circumnavigate those copyrights. And these battles have been fought in a variety of different settings in which titles might be replicated, duplicated, shared and backed up. What happens with 3D printing will be very interesting to see. In the Solidoodle (I just love that name!) article they discuss a 3D Yoda model that they printed. What does Lucasfilm think of this, I wonder? In a more mundane question, is the bottle opener they print in 15 minutes using $.15 worth of material something of their own design or did they copy it from something they purchased? I see lawyers having a field day in the 3D printing realm – if they could just know each individual who actually printed something that came from someone else’s IP.

Knowing our litigious society I think the following scenario might even play out. Since the means to know who downloaded what is something that can be tracked today, perhaps 3D printing applications will need to do this and report back. Or if just downloading isn’t considered an issue, perhaps the $500 printer may need an additional piece of software that shares information about anything it prints to an IP database someplace. The image can be compared to images of known copyrighted, patented or trademarked work and then the users of such technology can be billed for the use of the IP. I feel slimy just thinking about this, but it strikes me that there will be a huge battle in this arena in the not too distant future. For a good, less sinister read on this whole area try 3D Printing and Intellectual Property by Max Maheu.

The other thing that strikes me is that, extrapolated out many years and surviving IP battles, I could see supply chains for mundane replicable goods being replaced by supply chains for 3D printer supplies. You need to replace your bottle opener? Just print it. If you run out of your filament printing stock, just order it. You don’t have a printer, go to the local library and create it there. What this means is the extended supply chains for any replicable goods may shrink significantly. Your home office will become a mini-manufacturing plant for mundane items and the costs – if the $.15 bottle opener is any indication – will be relatively low.

If the extended supply chain for these products from China and various developing nations significantly contracts, those production resources will be freed up to focus on other more complex products. In many ways this seems to be a good thing as it will bring efficiency to manufacturing replicable items (and perhaps more complex things down the road), reduce environmental costs associated with transportation of all those products, and help focus factories on what could be more high margin products.

Still, on the environmental front, you have to ship the filament so there is transportation costs to that. Thus, it might be appropriate to ask where the filament is created and shipped from. Also, we might ask about the non-supply chain environmental impact of these 3D printers and their filaments. What is the makeup of the filament used in this production process and what are the up front and long-term environmental and human consequences of these types of materials and the products made from them? And, asJoris Peels ponders, will the availability of cheap, quickly printed goods lead to a preponderance of “Impulse 3D print” where products become throw-away items and thus create considerable waste?

I am excited about the technology but cautiously optimistic about the environmental aspects. Regarding supply chains, I think local production for appropriate products can be a very useful thing, reduce dependence on foreign manufacturing and brittle supply chains, and perhaps spur new ideas – and products – like we’ve never seen before as consumers become their own designers and entrepreneurs as they try print and hone their own ideas. And this latter idea – the use of 3D for creativity not just for replication – is the thing that I think holds the most promise from this new technology. I am hopeful.

For the better part of the last two decades I have advised my clients and employers to push hard on their suppliers to adopt either EDI (Electronic Data Interchange) or, in certain more recent cases XML, instead of trading using phone, fax, email or snail mail. The costs – both in time and in bad data – are too great to continue executing business transations in a completely manual way. I remember years ago when the web first made it viable, I also started recommending that when suppliers would not trade electronically with them, companies should push the use of web portals my clients could off-load the data entry – and responsibility for bad data – onto their suppliers. Unfortunately many companies (not my clients, mind you) abandoned electronic trading altogether when their IT shops found supporting portals was easier than supporting electronic trading…but that is a completely different story.

The thing here is the idea of having your supplier handle the entry and validation of data. They have a vested interest in making sure the invoice information they provide you is complete and accurate or they might not get paid. In the same way, I see the the social networks, and other companies, that are aggregating information on all their users leveraging the same model…but it is even better for them.

Every one of us must fill out web forms when signing up for almost any service on line. Information we share includes name, address, email address, age and many other personal tidbits. The information becomes the basis of the “consumer as a product” business that these companies have adopted where our virtual selves are sold to the highest bidder – either directly for cash or indirectly through targeted advertising and routes. In a March 11th column for the Huffington Post, former Federal Trade Commissioner Pamela Jones Harbour wrote “To Google, users of its products are not consumers, they are the product.”

Imagine having a “self-building” product. It is self-aware and concerned about itself so it naturally wants to make sure the information about it is accurate so on a daily basis it makes itself better. From a data quality standpoint, there are few better ways to make sure your data is accurate. Make sure the products – er, those that enter the data – have a vested interest in it. But the data quality aspect of social data is even better! For the most part, the data collected from on-line searches, click-throughs, web browsing and everything else you do online – whether collected overtly or covertly – will pretty much be accurate – it is near impossible not to be.

Social data, then, is a data managers dream. The real challenge is what to do with all the data. With few data quality issues, data managers responsible for social data are left with working with their business counterparts to figure out the best ways to exploit the data they collect. Which leads me to an old quote from Google’s ex-CEO Eric Schmidt when speaking with The Atlantic: “Google policy is to get right up to the creepy line and not cross it.” With Chloe Albanesius at PCMag.com reporting that at least one recent company defector thinking that Google+ has ruined Google, perhaps they’ve stuck their nose just across that line afterall. Quality Big Data can be a scary thing.

My friend Cliff recently approached me with a problem. His organization has tasked him and his team with analyzing, amongst other things, the depth of their bad data problems in advance of replacing their financial systems. Initial indications are that their data is not just bad, it is very, very bad. His question? When is it ok to leave the data behind and not port it over to the new system? When should he just “let it go”?

In looking at his problem, it is obvious that many problems stem from decisions made long before the current financial system was implemented. In fact, at first glance, it looks like decisions made as long as 20 years ago may be impacting the current system and threaten to render the new system useless from the get-go. If you’ve been around long enough you know that storage used to be very costly so fields were sized to just the right length for then current use. In Cliff’s case, certain practices were adopted that saved space at the time but led to – with the help of additional bad practices – worse problems later on.

When we sat down to look at the issues, some didn’t look quite as bad as they initially appeared. In fact, certain scenarios can be fixed in a properly managed migration. For example, for some reason bank routing numbers are stored in an integer field. This means that leading zeros are dropped. In order to fix this, scripts have been written to take leading zeros and attach them to the end of the routing number before storing in the field. Though I haven’t seen it, I’ve got to assume that any use of that same field for downstream purposes includes the reverse procedure in order to creat a legitimate bank routing number. Of course, when a real bank routing number and a re-combined number end up being the same, there are problems. He hasn’t yet identified if this is a problem. If not, then migrating this data should be relatively easy.

Another example is the long-ago decision to limit shipping reference numbers to 10 digits. There are two challenges to this problem for him. The first is that many shipping numbers they generate or receive today are 13 digit numbers. The second is that they generate many small package shipments in sequence, so the last 3 digits often really, really matter. When reference numbers’ size expanded beyond the 10 the original programmers thought would be enough, a bright soul decided to take a rarely used 3-digit reference field and use that to store the remaining digits. Probably not a bad problem when they had few significant sequential shipments. However, since the current system has no way to natively report on the two fields in combination – and for some reason no one was able to write a report to do so – every shipment must be manually tracked or referenced with special look-ups each time they need to check on one. Once again, this problem should probably be fixable by combining the fields when migrating the data to their new system although certain dependencies may make it a bit more difficult.

Unfortunately, there are so many manual aspects to the current system that 10 years of fat-fingering data entry have led to some version of data neuropathy – where the data is so damaged the organization has trouble sensing it. This numbness becomes organizatoinally painful in day-to-day functioning.

Early on in my data quality life, another friend told me “Missing data is easy. Bad data is hard.” He meant that if the data was missing, you knew what you had to do to fix it. You had to find the data or accept it was not going to be there. But bad data? Heck, anything you look at could be bad – and how would you know? That’s difficult.

So, this is Cliff’s challenge. The data isn’t missing, it is bad. But how bad? The two scenarios above were the easy fixes – or could be. The rest of the system is just as messed up or worse. Being a financial system it is hard to imaging getting rid of anything. Yet bringing anything forward could corrupt the entire new system. And trying to clean all the data looks to be an impossible task. Another friend was familiar with an organization that recently faced a similar situation. For them, the answer was to keep the old system running as long as they needed to for reference purposes – it will be around for years to come. They stood up the new system and migrated only a limited, select, known-good set of data to help jump-start the new system.

This approach sounds reasonable on the surface, but there may be years of manual cross-referencing between the systems – and the users of the old system will need to be suspicious of everything they see in that system. Still, they have a new, pristine system that, with the right data firewalls, may stay relatively clean for years to come. How did they know when enough was enough? How did they know when to let their bad data go?

This past week a Georgia high school student sent a text message that was supposed to read “gunna be at west hall today”. Unfortunately the auto-correct feature on his smart phone changed the first word to “gunman”. To make matters worse, he evidently fat fingered the address he sent the text to so the person receiving it didn’t know who he was so they took the appropriate cautionary step and contacted the police. The result? A high school was shut down for a few hours until things were straightened out.

Or we could discuss the fat fingering of the address. This actually is of most interest to me. I’ve researched keystroke error rates in the computer space before and concluded the average typist fat fingers a keystroke about 5% of the time while professionals might do so 1%-2% of the time. The thing about these percentages is they can be very deceiving. 5% error rates can explode in the wrong situations. For instance, imagine a 10 character-long field. If you had to fill that field in 10 times and you averaged 5% error rate in your typing, the best effective ongoing error rate you could achieve would be 10% because across 10 fields you make all the errors in one field (5 of the 10 characters are wrong in that one field) but all the other fields are correct. 1 in 10 fields being incorrect is a 10% error rate.

At worst your effective error rate would be 50%. This would occur if you made one keystroke error in each of 5 fields since you will be typing 100 characters and you have a 5% error rate. If 5 of 10 fields each have 1 error, your overall error rate across the fields is 50%. Corporate data can go really bad really quickly in this scenario. Of course most systems have found ways to limit these types of issues through check boxes and drop-down lists and other validations. But sometimes data entry can’t be avoided. Entering customer contact information is one such situation. Get 5 in 10 email addresses wrong and there goes your email marketing campaign. Even if you are careful error rates can be significant. I read a study some time ago that indicated that careful typists often catch their mistakes, so professionals usually average only 1 in 300 unfound errors. That’s much better, but still can translate into problems for your enterprise. Using the same 10 character field, 1 in 300 errors would equate to 1 in 30 fields being erroneous. That’s over 3% – still pretty bad, but I know some organizations that would love to have data quality problems with only 3% of their data.

So this, now, brings me back to where we started. I’ve got to believe that without autocorrect, the keystroke error rates on smart phones is significantly higher than it otherwise would be. Typing on those tiny keyboards is always a pain for me. I’ve found on my phone that the right place to touch in order to get the letter I want is just to the left of the letter – not right on it. I’m constantly back-spacing and retyping. Perhaps this is why there are so many bizarre autocorrect examples in the world (no, I’m not saying I’m responsible for all of them – just that other people must have similar challenges to mine, don’t they?). I miss the Blackberry keypad!

I think organizations should be very wary of leveraging smart phones for serious business apps that require data entry, unless they have extremely strong user data validation methodologies. Because I expect few developers to be so vigilant in their app development, I believe the future could bring some very unfortunate results from using business apps on smart phones. While DYAC entries sure can be funny (warning, they can also be quite vulgar) the same errors in business transactions could be catastrophic for your enterprise. Imagine things going terribly wrong and facing a lawsuit and having to use the “damnyouautocorrect” defense. You might end up in a much worse situation than a couple hours of high school lockdown.

In his blog yesterday “Destination Unknown”, Garry Ure continues an ongoing discussion regarding the nature of Data Quality. Is it a journey or a destination? Garry eventually suggests that there is no “final destination” but rather a series of journeys that “become smoother, quicker and more pleasant for those travelling.”

I have followed the conversation and am convinced that DQ should be a series of destinations where short journeys occur on the way to those destinations. The reason is simple. If we make it about one big destination or one big journey, we are not aligning our efforts with business goals. We are pursuing DQ for the sake of DQ and it will become the “job for life” mentioned throughout this ongoing conversation. Yet, that life might be a short one when lack of realized business value kills funding for DQ initiatives.

If we are to align our DQ initiatives with business goals, as I suggest is an imperative in my last post “Data Quality – How Much is Enough?”, we must identify specific projects that have tangible business benefits (directly to the bottom line – at least to begin with) that are quickly realized. This means we are looking at less of a smooth journey and more of a sprint to a destination – to tackle a specific problem and show results in a short amount of time. Most likely we’ll have a series of these sprints to destinations with little time to enjoy the journey.

Once we establish that we can produce value to the organization, we may be given more latitude to address more complex or long-term problems. But this will only happen if we once again show the value to the organization in terms that the business people understand.

While comprehensive data quality initiatives are things we as practitioners want to see – in fact we build our world view around such – most enterprises (not all, mind you) are less interested in big initiatives and more interested in finite, specific, short projects that show results. If we can get a series of these lined up, we can think of them in more in terms of an overall comprehensive plan if we like – even a journey. But most functional business staff will think of them in terms of the specific projects that affect them. Do you think this is the enterprise DQ equivalent of “think global, act local”?

I read Henrik Liliendahl’s blog post today on “Turning a Blind Eye to Data Quality”. I believe Henrik and those that commented on the post have some very, very good points. For data quality professionals, the question might be “How much is enough?” when it comes to data quality. And the answer to that question really depends on the nature of your business and how the leaders in your organization view the value that data quality can bring them. The question we will most often be asked is “how does DQ help my bottom line?” If we as data quality professionals can’t tie DQ initiatives directly to bottom line impact, it will be hard to get serious attention. And believe me, we want serious attention or our own value to organizations will be questioned.

This means we probably need to change the conversation away from DQ as a means to its own end and towards a conversation about how selected projects can have a positive impact on the bottom line. That conversation may be happening within many organizations at levels above our pay grades. Our first goal should be to ask our direct managers if the conversation is going on and how can we help in real, meaningful, tactical, financially relevant ways. If that conversation is not happening, we need to ask what can we do to get the conversation going in the same real, meaningful, tactical, financially relevant ways. We must abandon the mythical goal of a single version of the truth for all attributes. Our goal needs to be about making the business more successful in incremental tangible ways and thus making ourselves more successful in incremental tangible ways. After all, as Henrik points out, our businesses are being successful today despite bad data.

In comments to Henrik’s post, Ira Warren Whiteside mentioned that “As with everything else in order to convince an executive to “fix” something it has to be really easy to do, not involve a lot of collaboration and be cheap”. I would argue with this perspective. I think that the decision to go forward with the project should be easy, not necessary the project itself. This means clear, unquestionable value to the business (most likely directly to the bottom line) is needed. And I think such a project should show impact quite quickly. You can’t have a 12, 24 or 48 month project without value being realized within, say, the first six months or so. Thus, it is best if the initiative can be absorbed in bite-sized chunks so initial benefits can be quickly realized to help reinforce a culture of DQ. Don’t wait too long to deliver or you will lose your audience and your chances for any future projects will diminish – greatly.

In the retail supply chain, suppliers end up paying, on average, 2% of gross sales in penalties to their retail customers. This is 2% that is taken directly from the bottom line and is usually tied back to inaccuracies in delivering orders – problems with being “on time”, “right quantity”, “right product”, “right location”, “broken products”. Each supplier has people (often a whole team) focused on resolving these issues. There are two key ways they tackle these problems. The first is that they identify the dollar amount below which it is too costly for them to address the problem. In short, it costs them more to fix it than it does to just pay the penalty. The second is deep root cause analysis of those issues that are too costly not to fix – either in aggregate (ie. the problem occurs frequently) or as stand alone problems.

I think DQ practitioners could learn a lot from what is done to address these retail supply chain problems. The first is to identify what is ostensibly “noise”. The data that costs too much to fix based on its low impact on the business. The second is to take that bad data that is too costly to ignore and identify initiatives to resolve the DQ problems – and tie that back to tangible business benefits. The challenge is easy with “perfect order” delivery problems in the retail supply chain. Companies know that the penalties are directly impacting the bottom line. Fix a problem and the resulting money goes straight back to the bottom line.

I’m not so sure its that easy for DQ professionals, at least not with all our DQ problems. But I’ve been around long enough to know that there are some low hanging fruit that exists in just about every business. As a hint, if your company or customers are suppliers to retailers, you might start your quest for a promising DQ project with your compliance team since often supply chain problems can be tied back to DQ problems. Imagine that!

Should it to you?

When it comes to treats, my donkeys don’t seem to care how big the cookie is – just that they are getting something. Thus, I keep my donkeys lean and my costs low by breaking their special horse treats in two before letting the boys scarf them down. Likewise, I break our dog treats in 2 or 3 before dispersing to the pack. Polly in particular has benefited from a lean diet supported by pieces of biscuit as opposed to the whole thing. Of course, I could just purchase smaller treats, and sometimes I do. But what about your company? When it comes to your EDI and B2B programs, do you go for the small provider or the big one? For you, does vendor size matter?

In recent articles in EC Connexion and the VCF Report, I outlined my thoughts on the pending GXS/Inovis merger. In the former I focused more on the B2B practitioner and what it might mean for them. In the latter I leveraged some of the first and provided a higher-level view focused on the business and managerial reader. In both I pointed out that what you get out of the merger depends on how you and your company reacts. There is enormous potential in the merger, but companies will only benefit if they hold the new company accountable for brining the right mix of solutions to the table – a mix the merged company will have access to, but might not fully exploit.

However, this merged company will be best positioned of all extant B2B players to provide the full end-to-end services on a global basis. In fact, others who claim to be global will – when you look under the hood – only have one or two people in emerging markets like China, India and Brazil where the merged company will have significant resources in those locations. In this case, size does matter. If you have global operations – whether it be your own enterprise or companies in your extended supply chain (supplier, supplier’s supplier, customer, customer’s customer) you must consider the ability of your B2B partner to help you manage that ever changing supply chain as you move production from country to country and from far offshore to a mix of far and near shore manufacturing, and as you change your mix of carriers as production changes.

Successfully implementing, maintaining and managing a global business requires that your partners be there – and that they have been there doing what you need done for a while. You need them to be smarter than you when it comes to new countries and new regions. There is no use partnering if the partner can’t bring something to the table. There isn’t another B2B player that has the global reach within their own organization than the GXS/Inovis merger will bring. Much of that comes from GXS, but with the addition of Inovis’ unique solution offerings, the global capabilities of the merged company will be significant.

No matter the size of your business, if you do business globally, size should matter to you. There are other fine players in the space – Tie Commerce is one smaller player with strength in Europe, the US and Brazil – but if you truly need round-the-world visibility, accountability, presence and capability, your partner’s size will matter. Don’t underestimate the knowledge that doing business in a country can bring, or that being able to do business in nearly 20 languages can get you. Legal, business and technical knowledge are all there for leveraging. The question is, if size does matter and you choose the merged company, will you demand that they leverage their size (both global presence and overall solution set) to help you do business better with enhanced global visibility, data quality and synchronization, compliance management and full, uninterrupted end-to-end automation? Or will you just ask them to do what you’ve always done – automate purchase orders, invoices and ship notices and call it a day. If that’s all you do, your stakeholders won’t be happy and, when it comes to your B2B partner, size really won’t matter after all.