8 Comments

I've attached a document that hopefully will help at our next meeting.

The first part of it will hopefully help with what we were discussing at the last meeting, to describe the relationships between category - category item - category set; code - code item - code list; level - classification item - classification scheme.

The second are two suggestions for consideration - mostly to start discussions at the meeting!

I missed the discussion about this issue in the last meeting as I had to leave but I kind of agree with the document Helen posted and I wanted to add an example of how I understand it following how these objects are defined in GSIM 1.0.

From the original post by Klas the question has arisen of whether we need them all objects or could we do without the “Item” ones. For me it looks as if Category and Code Items are naming conventions or identifiers while Category and Code contain the meaning. Could we have it all in the same object?. Here’s my example:

Category vs Category Item:

Let’s consider the Concept System: “Geographical entities in Spain”. Then take the following two Category Sets:

Both (“Male” and “Albacete”) use the same Code Item in one of the instances but the actual Codes mean a different thing as they take their meaning from their respective Category.

Note: This is not exactly like this (“Male” is assigned the Code: “1” instead of “01”) but for the purpose of the example I cheated a bit to show that the same Code Item can have different Code meaning depending on the Category.

Code vs Code Item – Part 1

Dan made the point at the previous meeting that Code is a designation. Therefore the definition of a Code (as per the UML for the Specification layer) might be “F (Female)” where this definition of the Code might be read as “F designates the category ‘Female’”. The code in this definition is not just “The letter F”, the definition includes what is designated by “F”.

In Section II of the Specification Layer (the plain English documentation) para 83 says.

A Code List is also a type of Concept System. It is used for creating a group of Codes and their associated Categories. It can consist of one or more Code Items. A Code designates a Category providing representation to the meaning from the Category. For example in "F - female", the Code is F and the Category is Female.

The example here reads to me as saying the Code is “F” and not “F (Female)”.

I interpret Dan as saying the UML is correct and the example in Para 83, technically, is not.

So, let’s assume the UML is correct.

I still want to be clear from a business perspective whether I would want to manage Code separately from Code Item.

I can’t think of a particularly “official statistics” example off the top of my head, so let’s take “F (Ford)” as a Code. (F is the New York Stock Exchange for Ford).

I could have a number of different Code Lists with “F (Ford)” as a Code Item, eg • Code List: US Fortune 500 companies • Code List: Manufacturers of Pick-Up Trucks (worldwide) • Code List: Companies that sell automobiles in the USA

“F (Ford)” might be a Code Item in all three Code Lists. Each instance would be a different Code Item but would be the same Code. (In the case of the Motor Vehicle Census in Australia it is both a different Code, 1999999 (Ford) and a different Code Item.)

Other Codes might be used in a Code Item in only one, or two, of the three Code Lists.

I care about Ford as a Category, and I care about “F (Ford)” as a Code Item when I need to include it in a Code List but I am still not sure why I want to manage the Code “F (Ford)” as a separate business object.

If I particularly wanted to have recommended designations for categories then that could be an attribute of the Category itself, eg • if you want to designate “Ford” by a code then “F” is recommended based on the NYSE designation

In general, I see managing attributes as “cheaper” than managing additional objects.

(I am not convinced there is even a business reason for this to be captured as an attribute - but if there is a business reason to capture it at all then capturing it as an attribute would be my preference at the moment.)

Code vs Code Item 2

OK, so here is an example of an edge case that comes up from time to time.

We could say 211, 220, 231 & 291 comprise “Level 1”. We could also say that the indented entries comprise “Level 2”.

If we wanted “Level 2” to be complete (provide coverage for all commercial buildings) we could say that 211, 231 & 291 are also valid for Level 2.

The modelling of GSIM currently suggests that a Node (and a Code Item is a form of Node) can belong to 0..1 Levels. Would we therefore be saying for this Code List that “211 (Retail and wholesale trade buildings)” is one Code that forms the basis of two Code Items in this Code List?

You can arbitrarily have, eg, “210 (Retail and wholesale trade buildings)” at Level 1 and “211 (Retail and wholesale trade buildings)” at Level 2 but this may cause you to store the same data twice, once coded to 210 and once coded to 211. Both 210 and 211 would be associated with the same Category.

Even if you just run with “Retail and wholesale trade buildings”, however, if it exists at two different Levels then it needs to be two different Category Items even though it is one Category?

A code is a designation (the code value?) and its underlying meaning. May exist in multiple codelists. You want to know if a code is being used many times so you can rationalise the use. Do we need the distinction between code and code item when implementing? What is the business value for managing code separately? Should we have an attribute on category which is the default code?

A code item is a code as an element of a set. The idea is to have a bin to keep off the stuff that as it resides in a code list?

A code list is sets of code items

Isn't it more important that we know that the same category is being used, not the code. We should reuse categories

Recommendation:

Do we need the distinction between code and code item when implementing? What is the business value for managing code separately?

1) I agreed a bit too quickly to the idea of an attribute on a category. In Statistics Norway we manage standard classification versions/variants and codelists. We don't manage standard catgories. However, all the categories in a standard are themselves standard.

2) Managing codes seperately from categories imposes an extra burden when linking the categories to the codes rather than just managing code items. Is this burden worthwhile if we keep in mind that most of us need to manage our categories in several languages, reusing the codes for each language, or is that just an implementation issue and not a business issue?

In terms of the recommendation on code vs code item, it seems the consensus remains that the focus should be on code item.

Your (1), however, seems to raise a question about whether the second part of the proposal - adding an attribute on category which allows you to specify a default code - is actually appropriate.

I still think it is, but maybe we need the group to clarify (eg add some likes or dislikes to the recommendation as it currently stands).

The reason I think it is fine is that just because the attribute is added, you only populate it if desired. It sounds like Statistics Norwary probably don't currently manage categories in their own right - only as part of standard sets/lists/schemes. This would make adding a new attribute for Category even more moot from Statistics Norway's point of view - but not from the view of some other agencies.