Pages

"An extraordinary thinker and strategist" "Great knowledge and a wealth of experience" "Informative and entertaining as always" "Captivating!" "Very relevant information" "10 out of 7 actually!" "In my over 20 years in the Analytics and Information Management space I believe Alan is the best and most complete practitioner I have worked with" "Surprisingly entertaining..." "Extremely eloquent, knowledgeable and great at joining the topics and themes between presentations" "Informative, dynamic and engaging" "I'd work with Alan even if I didn't enjoy it so much." "The quintessential information and data management practitioner – passionate, evangelistic, experienced, intelligent, and knowledgeable" "The best knowledgeable, enthusiastic and committed problem solver I have ever worked with" "His passion and depth of knowledge in Information Management Strategy and Governance is infectious" "Feed him your most critical strategic challenges. They are his breakfast." "A rare gem - a pleasure to work with."

Wednesday, 30 July 2014

Business Glossaries – the pointy end of metadata management

A
recent thread on LinkedIn raised the issue of implementing a business glossary (in
particular relation to using IBM’s Business Glossary tool).

I
generally try to avoid commenting on any particular vendor’s products in this
blog – as regular readers will know, I’m much more concerned with all the joys
and frustration of the human aspects of Information Management! However, the
LinkedIn discussion raised some interesting questions on the topic of “business
glossaries” and metadata management more generally, and I think these are worth
exploring and summarising.

The key purpose of a “business glossary” is that of
human communication and collaboration – to exchange business-level
understanding and interpretation of informational terms. A business glossary
(sometimes referred to as "business dictionary", “business metadata”, “business vocabulary” or “business
lexicon”) collects terminology that expresses business concepts in the
language of the end-user, with the aim of collating one consistent set of terms
that are commonly understood by the user community. That’s hard enough!)

Getting people to agree on words & definitions is difficult.
Unfortunately life isn't that clean cut. Even just collecting as many
words/terms/phrases/acronyms as you can grab and examining the different
uses/definitions/conflicts can be time consuming.

When I was at UNSW we ended up collecting over 1500
terms, which one way or other resolved to about 400 groupable items (e.g. there
were six different ways of establishing whether or not we counted someone as a
"student").

Resolving all of those contentions, discrepancies and
ambiguities in one go was way too hard for most people to get their head
around, let alone show any interest!) So we focussed on one subject area - in
this case, Staff/HR data definitions - which was a high priority as we were in
the process of re-implementing our HR admin system.

One common question that was asked very often was “How
many staff do we have?” This invariably led to much wailing and gnashing of
teeth as people scurried around, frantically trying to answer the question,
only to come up with multiple different answers, none of which corresponded
with any other answer.

Of course, the challenge was that there was no agreed
understanding of what anyone meant by “How many”, “staff” and “have”!

At UNSW, we had a working party of 5 full-time
team members, supported by a part-time stakeholder group of approximately 30
nominated business representatives. (For some comments about the
"consultation culture" at the university, see my interview for DataQualityPro.)

We settled upon approximately 80 agreeable
terms, which also included some significant re-thinking of business concepts in
some cases e.g. we had to split the generic and idea of someone having
"contract" into the concepts of "employment status"
(permanent vs temporary), "employment type" (fully employed vs fixed
term vs contract vs casual), "payment arrangement" (paid vs emeritus
vs conjoint/volunteer) etc.

Just for the "Staff/HR" subject area,
it took over six months to get the definition of terms resolved. We than had to
start on the process of cleaning the actual data, ready for migration to the
new system... Whatever you're doing, patience is a virtue!

It’s also worth noting that the issues of identifying,
validating and communicating this common business language are very different
from (though related to) the more detailed questions of data
modelling, integration, traceability, integrity and auditability
enforcement which might be considered the realm “technical metadata” (and I’m
using the word “technical” very advisedly here to mean any aspect of metadata
management that isn’t immediately end-user facing!)

By
the way, I’m not suggesting that “business metadata” and “technical metadata”
are separate – indeed, it’s vital that they integrate and correlate to/down and
bottom/up.Together, these will form the
core body of knowledge that defines the existence of the organization. However,
in order to make things manageable, it is useful to think of them as different
views of the same thing, dependent upon role and purpose.

In
my experience, if you want to be successful in implementing an Information
Management environment, it is absolutely vital to address the human, cultural
and societal factors that make for a successful outcome. Build organisational
capability and resilience for Information Management as a set of foundational
disciplines. Think about the accountabilities, responsibilities and process
controls that are required.

Hint
- Enter into a project naively, and you will fail.

don't buy the tools unless you are prepared to deal with the human factors.

6 comments:

Great post, Alan on a topic around which I have built my 'new' information management architecture. I say 'new' because I went through several data management exercises exactly like the one you describe at a university college in Calgary. I hope my comments here will build on your post.

In the course of building a data dictionary, we discovered three things that helped enourmously:

1. Decouple the common definition of a term (like "Student") from the applications that used it.2. Force people to be very explicit about the qualifiers they might use with the term. For "Student" these would include many status values like: foreign, part-time, credit free, etc.3. Get consensus from the group on exactly what we were counting or referring to. In the case of "Student" we were counting individual people. 4. Ask the group where the set we were interested in might overlap with other sets. For example, the set of 'People in the role of "Student" might overlap with, say, the set of 'People in the Role of "Staff".

Someone in another discussion group said they use a dictionary to establish the decoupled definition and I often use the same technique. It's amazing how many fights stop at that border. When you define a term, in this case a Role, as universally as possible, modifiers will not change that definition.

Likewise, discovering what people assume are the acceptable modifiers is always interesting and almost always rewarding. "Student" at Mount Royal University and "Student" at UC Berkley are likely two different things; just as "Student" in the Finance system is different than "Student" in the Registration System. (they shouldn't be but there you are). I call this the dialects challenge. Once you know that one is speaking money while the other is speaking bums in seats it's mush easier to have that conversation.

Speaking of bums, if you assume (that word again) that we since everyone who is a student has a bum then counting bums (as opposed to counting People) is acceptable. For your readers now imagining 'bumless' people I apologize, but stranger assumptions have been made when it comes to statistics. For example, counting rows instead of individuals. Surface the assumptions and again things become clearer.

Finally, the overlap question points to the answer to #3. If we are counting Persons in different Roles, and it's ok for a person to occupy more than one then you will have overlap.

Just a few thoughts on a Saturday morning. I'm going out to cut the lawn now. Until next time!

It's a well-written post, one with which I agree. However, please don't overlook the value to be gained by trying to arrange an entity's terminology in a structured manner, i.e. - a glossary, one which may be published to a wide audience. Creating structured definitions of terms forces stakeholders to explain their rationale more carefully and in the context of other terms rather than in isolation. The latter is when mistakes are made.

During discussions about terminology, we should not lose sight of the business (or technical) rationale that drives each term's existence. For example, there are many geographical attributes at any company, so which one is most appropriate for responding to your particular business question? In the HR world, there are many attributes which seem to overlap, e.g. - worker life cycle status (employment status), worker contract type (employment type), FT-PT classification, FTE, management level, exemption status, pay grade. I have repeatedly seen business people mix them up by awkwardly combining their value sets into one picklist. A glossary helps show "lay people" why there should be two attributes, not one.

This is an excellent post! I'm working on a serious of "best practices" posts for a user group I'm involved with, and if it's well received I'll broaden the audience. When it comes to glossary some of the points you make are part of what I'll be saying. I think you imply the need for collaboration and governance, which are critical. I think your point about "naivety" is also excellent -- I've seen something of a tendency to believe that glossaries can be centrally mandated on a "one and done" basis ... which certainly ain't so!Both John and Jim make some great points about the relationship between terminology, organization, and technology. One of the efforts I'm engaged in is an attempt to articulate a model that embodies some of that thinking. We need to move these practices in more "standard" directions!

Thanks Jim & Ian. We get into all sorts of other areas quite quickly - Data Modelling, Requirements Gathering, Governance & decision rights etc. The skill of the information practitioner is to help the business group (and IT) navigate these - ideally, without getting into too much detail of exactly what/how we're doing it!

I've also got some useful techniques (available in my coaching/training packs but not part of my blog, as yet) which explore the different layers within the overall information model (business/logical/physical) as well as illustrating the metadata management processes & "grey area" relationships between glossaries, taxonomies, hierarchies, business classification schemes, data models, reference data sets etc. All vital stuff, but not necessarily the type of thing you'd expose to a user group! (as I've previously found to my cost...)

Elsewhere on the blog are some of my thoughts on various related topics - and see my "Tube Map" page for some ideas of how each Information Management discipline relates.

Creative Commons

Site Search

About Me

Alan D. Duncan is Research Director for Business Analytics at Gartner Inc and an evangelist for information and analytics as enablers of better business outcomes.
Formerly a member of the advisory board to QFire Software and Director of Data Governance at UNSW Australia (The University of New South Wales), he was named by Information-Management.com in their 2012 list of “Top 12 Data Governance gurus you should be following on Twitter”.
As husband to his ever-forgiving wife (Kylie) and father to two increasingly bemused kids (Ollie and Isla), Alan is reminded every day to aspire to adequacy. To date, he hasn’t achieved it.