What is a real life example of the difference between a Classification and a Code List?GSIM documents that there is a difference between the two, but when you get right down to it, where do you draw this line?

In your opinion, what is the boundary between a Code List and a Classification?

14 Comments

In GSIM, the difference between a code list and a classification is that the categories in each level of a classification are mutually exclusive and exhaustive. Categories in each level of a code list don't have to satisfy those criteria. For instance, race and ethnicity categories don't constitute a classification since each person can fall into many of those categories at once. Therefore, the categories are not mutually exclusive. For example, my ethnicity is English, Irish, Lithuanian, Russian, Scottish, and Ukranian.

I agree with the notion that a code list does not have to have categories that are mutually exclusive and exhaustive, whereas a statsitical classification does. I can't agree with Dan's example, however. It is entirely possible to create a set of race, ancestry or ethnicity categories that are mutually exclusive. Whilst data on these concepts will frequently, but not always, refer to persons, the entities being classified are not persons. They are ethnic or ancestral groups or identities and a person may be associated with multiple groups or identities, in the same way as a person may speak multiple languages. It doesn't follow that that English and Ukrainian languages, ethnic groups etc are not discrete and mutually exclusive concepts.

The point is that it is possible to create a set of categories in such a way that they are mutually exclusive and exhaustive or in such a way that they are not mutually exclusive and or exhaustive. For example, if I create a set of language categories that contains English, Scouse, Geordie, Cockney, Bronx, Dutch, Flemish, Afrikaans and Netherlandic, this is not a classification becasue the categories are neither mutually exclusive nor exhaustive of a particular population of languges - but it is a category set.

David - The race/ethnicity example was based on the practical difficulty of writing an exhaustive list, even though it may be possible in theory. My naming 4 ethnicities was intended to illustrate the fact that people consider themselves all kinds of combinations of races and ethnicities. This is especially problematic in the US. The combinations are too numerous to list. In this sense, this can only be a code list and not a classification.

Yes, code lists and classification can share categories. First, code lists or classifications themselves have versions, so they share categories from one version to another, but that is only sharing within class. Here is an example of a code list and classification that share categories. This example is somewhat artificial, but it illustrates the issue. Supose a technical school of higher eduction offers coures in mathematics, physics, engineering, and computer science; and the school only allows students to major in one or two subjects. So, the course classification is the list given above

mathematics

physics

engineering

computer science

yet, the code list for majors is

01 - mathematics

02 - physics

03 - engineering

04 - computer science

05 - math/phys

06 - math/eng

07 - math/cs

08 - phys/eng

09 - phys/cs

10 - eng/cs

It is clear that the categories designated by the first 4 codes are the same as the categories in the course classification.

I have added an attachment with a real life example of the difference between a low investment, low reusability, low maintenance codelist with non-mutually exclusive categories and a high investment, high reusability but low maintenance classification.

I wrote a comment giving an example use case of a code with multiple parents. Whilst the example was correct the comment about GSIM support for this was not correct. Apologies if you have already read the comment (I have now removed it). I will write a (hopefully correct) comment as soon as I can. I'm out if the office for the rest of this week.

Here is the revised example use case of a code having multiple parents.

Hierarchical Code List

Chris Nelson

16 July 2013

Scope

The scope of this note is to give an example of a hierarchical code list where a code may have more than one parent. This is in response to the request made at the CC for the implementation group on 2 July.

GSIM Code List and Classification

Both of these inherit from a Node Set. A Node Set contains Nodes and both Code Item and Classification Item inherit from Node. A Node can be hierarchic in that it can comprise “child” or “part” Nodes: each Node may only have one parent Node (and may have none). However, a Node has a mandatory association to a Category.

So, whilst in GSIM a Code cannot have more than one parent Code, there is nothing in the model that prohibits the same Category being represented more than once in a Code List – (the definition of a Classification Scheme explicitly prohibits this for Classification Items).

An example use case for this is given below.

Example of a Code with Multiple Parents in the same Hierarchical Code List (HCL).

The example is a Geography Code List. In this list each geographic location such as a country can be in more than one “hierarchy”. GSIM has no such object as an explicit hierarchy: hierarchical structures are defined in parent/child or whole/child relationships between one Node and other Nodes.

Example codes that could have multiple parents using countries could be:

Continent

Trading Block

Currency Block

Military Union

Any one country can be in one or more of these hierarchies. In the GSIM model the semantic of the “hierarchy” (e.g. Continent) would need to be a Category linked to Code, as this is the only way of grouping contained lower level (+child or + part) Codes.

This use case is true for data dissemination and could also be true for other processes. Like other Categories the “parent” country in the Code List has no explicit relationship to data but can be used by an application to determine data values to which it relates (e.g. where the country is a dimension in a dimensional data set an application can allow viewing by e.g. Continent) .

Note that in SDMX there is an object called “Hierarchical Code List” (HCL) and this is a different object from a Code List where the Code can have a parent/child hierarchy but is restricted to each Code having a maximum of one parent Code.

The HCL in SDMX does not specify “Codes” in the GSIM sense (though each “node” or “hierarchical code” has an Id), it merely references Codes from one or more code lists and places them in one or more hierarchies (the “hierarchical code” can have children). This is not dissimilar to the GSIM Code (which can have children) referencing a Category (as the country semantic (e.g. France) will be a Category). However, in the SDMX HCL it is possible to maintain the codes comprising the “hierarchy” in a code list that is different from the code list comprising the countries. The HCL brings these together. Note that the SDMX HCL also has a Hierarchy object that contains the hierarchical codes.

A small reply on this interesting discussion. A country is not a geographical location. It is an administrative or political location. That is why a lot of regional classifications are in reality code lists. One level is a geographical classification (continent) and the sublevel is a administrative classifciation (country). This is not possiple because one country can be part of two or more continents. A level in an hierarchical classification should be a refinement (=according the same point of view = geographical in this example) of the parent level.

I offered to create a Hierarchical Code List using the GSIM classes that exist already. I have called this list a Complex Code List as the Code List in GSIM can already support hierarchical structures of Code Items. This may not be a good name and this is open to suggestion if this structure is deemed worthy of inclusion in a future version of GSIM.

Note that whilst this model is based on the SDMX model for Hierarchical Code List (HCL) the SDMX HCL also has optional Levels and some additional attributes to support specific uses cases but these are no included in this model. The green classes are those that exist already and the light tan colour is used for the new classes.

I think this discussion needs also to deal with the relation between category sets and statsitical classifications. The code list is the easy part, since the code is simply a representation of a category.