Practical Metadata and Standards for Clinical Research

Step Three & Keeping My Promise

It is getting towards the end of the week and I am conscious of a slide I presented in Vienna at the PhUSE conference and a promise to write up progress, lessons learned and anything else I can think of that might be useful. I am also thinking it might be beneficial to some to split any posts into two parts, one written for a user and their perspective and a second looking at the technical aspects for the nerds out there. There is also a little catching up to do.

Included within the PhUSE slide deck is this slide you can see to the right to show, at a high-level, what I am trying to do. I spoke to four steps:

Build a model

Implement a simple demonstrator tool to show ideas and get thoughts straight

Re-implement using the model from step 1 with a better User Interface

Improve

So I was documenting the model and I had covered the ISO 11179 and the terminology pieces and then I went off on my tangent and developed the simple MDR tools (see the panel at the top right for the links to these posts). Then I got into step three and along came PhUSE so I wrote it all up. And now I can may be step back a bit and write about this in a slightly more coherent manner.

So here we are, step three. Right …

Stage 18, 23rd July 2015. Gap to Saint-Jean-de-Maurienne

Oh, an aside. I was showing the previous version (the step two simple tool) to some folks and they asked did it have a name. At the time I didn’t. So when I started step three I thought it best to name the thing. I started the work on 23rd July 2015; the TDF was on. That day was a mountain stage with the riders covering 186.5km from Gap to Saint-Jean-de-Maurienne and they faced seven classified climbs on a day starting with the Col Bayard. So I decided to name the projects after the climbs that day. It took me a few attempts to learn some new technology and write a few demo apps to get the ideas straight over the following days. The application you will see is the sixth project. The sixth climb was the Col du Glandon. It’s called Glandon. Get over it (well the peloton did).

Right, enough of this naming nonsense!

Application (Slightly Nerdy)

This section will be short but jump over it if you are not interested in the nerdy bits.

The application is split into two parts: the database and the user interface. The database is a semantic database that can be either Apache Fuseki or Ontotext. I have been tending to use Fuseki as it is easier to debug than the Ontotext S4 service. However, the nice thing about the ontotext S4 service is that it is cloud based and I don’t have to manage it. Both are free which, as far as I am concerned, is wonderful.

The User Interface (UI) is a ruby-on-rails application using Bootstrap. The UI talks to the database via SPARQL over HTTP. The rails app uses a Model-View-Controller (MVC) approach with the semantic queries kept within the model part. Again, Rails is a free technology stack and has a lot of resources and third-party plugins available so it is a nice choice. To be honest there are so many equivalent technologies out there it becomes overwhelming. I sensed I was over thinking it all wanting the best so in the end I just went for one that was well used.

Enough of this nerdy stuff.

ISO 11179

Overview

So ISO 11179, my lowest level building block as you can see from the previous post on this (see panel top right). In that old post I wrote up the technical stuff and what I had implemented in the semantic model. Since then I have found creating some simple diagrams detailing the model has been helpful for getting ideas straight, seeing problems, fixing bugs, writing queries, spilling my tea on and doodling space when I cannot decide how to solve the next problem I face.

So I have drawn these diagrams for each part of the model. They are not technically correct in every detail and miss of some of the information (I might add some of this later) but they allow me to see what I want to see. I thought I would publish them to see if others found them useful. Also remember that the turtle files are on github here. I have produced these diagrams for every area so will include them at the appropriate points/blogs.

The Nerdy Bit

I have tried to be faithful to 11179 but also keep it simple. For example I have allowed for one registration authority (assumed to be the sponsor company) and allowed for several other organisations so as to allow for content to be imported (e.g. from CDISC). I have implemented Registered Items (those created and managed by the sponsor) and Identified Items (imported items).

So, the pictures. Click on them to get bigger versions in a new window. I had to split it into 2 parts to keep it readable. As an aside these are drawn with the OmniGraffle package running on a Mac (everything in these posts is done on a Mac)

ISO 11179 Model Diagram Part 1

ISO 11179 Model Diagram Part 2

The Users Bit

Simple message, don’t go here. There are some user interface pieces to access the 11179 info directly but I don’t see normal users going there. To me this make little sense. I want you to focus on code lists and forms and the like, the business stuff. I will hide the nastiness of making it work and hang together. Have a look at this earlier post on keeping it simple and user access/skills. This is where those user roles come in.

ISO 25964

Overview

I have implemented quite a simple model (subset of ISO 25964), just two levels of Thesauri and Thesaurus Concept with the Thesauri being an ISO 11179 IdentifiedItem for version management purposes. I have allowed access from the UI at the generic level (ISO 25964) and then another at the CDISC Terminology Level.

The Nerdy Bit

As I have said, see the previous write-up for details. The new diagram is below. As you can see it is pretty simple. Nothing else to say, read the previous post for details.

ISO 25964 Model

The Users Bit

List of CDISC Terminology versions

I want the user to focus on the terminology not on how to access it. So, currently I have implemented an interface to allow access to the CDISC terminology. I have loaded the last 9 versions of the terminology (this represents all the versions created by CDISC in the “.owl” format going back to December 2013) and allows you to browse it and see changes.

The first screen shot shows the various versions and the associated data of release by CDISC. The version numbers are my own internal scheme. Version 1 represents the first CDISC terminology excel file from, if memory serves me correctly, April 2007. I have an old excel spreadsheet and macros that compared all these old versions and it is this that has ‘version 1’ in it. So we have versions 34 to 42 with V42 being the latest September 2015 release.

A single CDISC Terminology Release and code lists therein

I can then go into a specified version and browse the contents. In this next screen shot we have the September 2015 release and the code lists listed out in a table. I can then select and individual code list, search the set or compare the entire release to an earlier release to see the entire set of changes or filter on just the new, deleted or amended entries.

Note that I can compare to any previous version, not just the preceding one. To me this brings a lot of power. A sponsor may only upgrade every six months (2 releases) or annually (3 or 4 releases). They need to see changes from the new version to the one they are currently using.

Having done that I could chose to look at a single code list and see the entire history. Here I have chosen the QSCAT code list to see the history for the code list and the associated code list items. I can see the Synonym was modified in June 2014 and precisely what change took place. I can do the same for the code list items, the second screen shot being an example of an item with two changes since December 2013.

The history for a specified code list allowing access to detail about changes

The history for a specified code list item allowing access to detail about changes

One other feature that I implemented (currently I have done this rather badly as it takes too long to execute but it does get there in the end, I think a better SPARQL query will solve this) is to do a global compare of comparing all versions to see all of the changes since the first version stored. The screen shot shows the resulting output.

Comparison of all versions of the CDISC Terminology

Note the small bug where I have failed to output the submission value which actually would make it much more useful! But the idea is to show what has been modified (the yellow pencils) and what has been inserted or deleted (the red) and then click to see the detail of those changes. I find it useful to see this view simply to gauge the extent of changes in a release. It is also useful to see which code list are suffering more changes than others (currently EGTESTCD & EGTEST seem to be suffering in this respect).

There are many ways in which the power of having all the the versions in an electronic form where it can be easily queried and used. The implementation really only scratches the surface but it shows what can be done. The power is in the database, it is question of how best to access it, what the users need to do their day jobs.

Next Steps

Since I implemented the above I have been working on Biomedical Concepts (Research Concepts) and Forms based upon those concepts. Information on this will form the basis for the next two blog posts.