Forge 14.4 technical overview

Introduction

Last Friday (November 25th) we published a new Forge release, Forge 14.4 for DSTU2 – FHIR DevDays 2016 edition. You can download the new Forge release for free from www.simplifier.net. The previous Forge release (13.2) dates from June 10th, so clearly it’s been a while since we’ve published an update. In this blog post I’ll elaborate on the new features.

Profiles on profiles

Profiles created in Forge 13.2 and earlier always express a set of constraints on a FHIR core datatype or resource. However the FHIR specification also allows you to author so-called profiles-on-profiles or derived profiles. A derived profile represents a set of constraints on another, existing (non-core) base profile. This allows you to define a hierarchy of profiles that are based on each other.

In a typical situation, a national standards organization publishes a set of official national FHIR profiles defining some common country-wide constraints. Organizations within this country create and publish their own profiles that are derived from the official national profiles. For example, the Dutch HL7 affiliate publishes a national Dutch Patient profile with a specific constraint on Patient.identifier for the Dutch social security number (BSN). Dutch organizations can now author and publish their own, more specialized profiles with additional organizational constraints based on (derived from) the national profiles. Because different organizational profiles all conform to a common set of national profiles, they provide a certain well-defined level of interoperability within the country.

Hierarchy of Profiles

The rules for derived profiles are similar to the rules for profiles on core datatypes or resources. A profile is valid if all resources that conform to the profile also conform to the associated base profile. This implies that a profile is only allowed constrain the base profile. A profile is not allowed to relax/remove constraints defined by the base profile.

Note: In a sense, we can interpret a profile as a filter or predicate on the resources space, cf. a WHERE clause in a SQL expression. The number of constraints that a profile expresses is inversely proportional to the total amount of conforming resources. As derived profiles become more and more specific, the total set of conforming resources becomes smaller and smaller. The extreme case being a profile that matches no resources at all; such a profile may very well be valid according to the FHIR spec, albeit meaningless. Similar to the SQL predicate WHERE 1=0 which is perfectly valid but will always evaluate to an empty set.

In Forge release 14.4, you can now create and edit profiles based on other (non-core) profiles. The command New Derived Profile creates a new derived profile based on an existing base profile on disk. The initial implementation is still somewhat limited and provides support for fairly simple base profiles (e.g. without slicing). Future updates will add support for more advanced scenario’s such as reslicing.

Resolving resources

Profiles created in previous Forge releases until version 13.2 were always based on a core datatype or resource definition.The Forge application directory contains an archive “specification.zip” that contains the official definitions of all FHIR core datatypes and resources. Upon application startup, Forge scans the ZIP archive and loads all the available definitions. When creating a new profile, Forge extracts the selected core base definition from the ZIP file and initializes a new profile from the selected base.

In the new 14.4 release we’ve added support to create and edit profiles based on other existing profiles. Forge now needs to resolve the custom base profile from a custom location, as it can no longer be resolved from the default ZIP archive that contains only the core definitions.

A profile can also express constraints on element types via external profiles. For example, a Patient profile may introduce a custom Identifier profile on the Patient.identifier element. In this case, Forge also needs to resolve the specified external profile in order to dynamically generate user interface components for profile child elements and verify the specified constraints.

The same holds for extension elements in a profile. Each extension element is mapped to an external extension definition. Forge needs to resolve the specified extension definitions in order to generate UI and verify constraints.

Forge should be able to resolve external profiles from the containing (working) folder on disk, from a separate external directory, from a FHIR server or from the FHIR registry Simplifier.net. To implement this, Forge requires a generic and flexible system to resolve external (user defined) profiles on the fly, separate from the standard core datatype and resource definitions.

Fortunately (and not coincidentally…) the FHIR .NET API library provides a flexible and efficient resolving mechanism based on the generic IResourceResolver interface, as explained by Ewout in his recent article Validation – and other new features in the .NET API stack. The library exposes default implementations that support resolving resources from disk and online. The library also exposes convenient decorators for caching and resolving from multiple sources based on priority. Forge 14.4 now leverages the FHIR .NET API to implement a flexible multi-step mechanism for resolving external (conformance) resources.

Resource Resolvers

Upon opening an existing profile in Forge 14.4, the application creates an associated resolver that is bound to the containing folder on disk (including subfolders). The resolver provides cached access to all external profiles in the same folder structure. Future updates will introduce support for the configuration of additional sources, e.g. on local disk or online.

Note: for now, please keep related profiles together in a common folder and Forge should find them. Be aware that the resource resolvers cannot resolve profiles with duplicates, i.e. multiple profiles in the same folder that share a common canonical URI. When Forge detects duplicate profiles, it will display an error message and list the conflicting files. In that case, manually delete/move all the obsolete duplicates from the folder and retry.

The current Session Explorer in Forge provides a conceptual/logical view of resources and resource containers. However the new resource resolving approach introduces a new folder-based workflow that is fundamentally different from the earlier approach and does not fit well with the current Session Explorer. Therefore the Session Explorer will soon be phased out in favor of a new folder-based approach (cf. Visual Studio Solution Explorer).

Snapshot generation

One of the largest changes in the Forge 14.4 release involves a complete new implementation of the logic involving the generation of snapshot and differential components. Let’s take a closer look at these concepts.

Snapshot Spectacles

A FHIR profile defines a structured set of elements. Each element definition has a set of properties. The combination of all the element definitions and associated properties uniquely defines a specific datatype or resource structure. This holds for both a core datatype or resource definition, as well as for a (derived) profile. FHIR defines this complete representation as the so-called “snapshot” component. The official name might be a bit misleading in that it suggests a captured state at a specific moment in time; while this is not incorrect, the key aspect of the snapshot component is that it intends to provide a complete and full representation of a certain data structure. An application that receives a profile snapshot has all the available information and does not need to resolve any external profiles.

A snapshot component can be quite large (~ thousands of lines of XML/JSON). And for a derived profile, the snapshot component contains a lot of redundant information: complete definitions of all the unconstrained elements and properties that are inherited from the base profile. Provided an application has access to the base resource, then it is possible to (re-)generate all the inherited redundant information. So we can also represent a profile by only describing the actual profile constraints and omitting all the information that is implicitly inherited from the base profile. Such a sparse representation of profile constraints is defined by FHIR as the “differential” component. The differential component only expresses the actual constraints with respect to the associated base profile. Generally, the differential component is (much) smaller than the snapshot component. Also the differential provides a convenient serialized representation for profile authors to browse and inspect, contrary to the snapshot component that contains a lot of redundant noise which makes it difficult to navigate.

The FHIR specification requires that a profile should contain either a snapshot component or a differential component or both (invariant sdf-6). A serialized profile with only a differential component has a smaller size and is therefore cheaper to persist and transmit. However this puts the burden on the receiving side to (re-)generate the snapshot component from the associated base profile. A serialized profile that contains a snapshot component is much larger but also self-sufficient; it contains all the available information and the receiver does not need access to any external profiles.

Forge releases up until version 13.2 contained custom application logic to generate the snapshot component from the differential. The old logic was fairly simple but useful as long as Forge was limited to profiles on core datatypes and resources. However in order to implement support for profiles on profiles (a.k.a. derived profiles), the existing snapshot generation logic was no longer sufficiently capable and needed to become a lot smarter. As we now had to completely re-implement snapshot generation anyway, we decided to move the logic out of Forge and into the open source FHIR .NET API library.

The implementation of full-fledged snapshot generation proved to be quite an undertaking. During the process we discovered that some low-level aspects were not yet clearly and/or fully defined by the FHIR specification. So we cooperated with the FHIR core team and profiling community to fill in the blanks and complete the missing parts of the specification. Especially Chris Grenz‘s contributions proved to be invaluable – Kudos! Working closely together with Chris, who works at Mayo Clinic, we managed to define how to unambiguously express some remaining advanced scenario’s and subsequently to align both our implementations.

Unfortunately the snapshot component is not completely deterministic, e.g. element id’s may vary, list order may vary etc. So FHIR cannot and does not define a “canonical” snapshot format that would allow us to perform a byte-wise or node-by-node comparison of two snapshot versions. Therefore a snapshot comparison algorithm needs some custom logic to handle special FHIR rules.

Although snapshot generation certainly isn’t the sexiest topic within the FHIR spec, it is a crucial aspect for any system dealing with profiles (including validation). And feature-complete snapshot generation proved to be quite challenging to implement. For example, in order to generate snapshots for core type definitions, the algorithm must be able to detect and handle recursive type relations, e.g. Element.id and Extension.extension. Also the correct implementation of different scenario’s involving (re-)slicing proved to be non-trivial. So we’re quite pleased that this fundamental part of the FHIR specification is now clearly defined and completely implemented (barring some unknown unknowns…). If you’re interested in the implementation (you are definitely a FHIR nerd like us and) you can inspect the Hl7.Fhir.Specification component source on Github. Fortunately usage is very simple:

The snapshot generator uses the specified IResourceResolver instance to resolve references to external profiles and/or extension definitions. If this argument equals null, then the generator automatically falls back to the set of FHIR core datatype and resource definitions. The optional settings parameter allows the caller to specify some custom configuration options for the generator.

Simplifier.net is not yet capable of automatically (re-)generating the snapshot component. Currently, when you publish a profile, you have to explicitly submit the snapshot component in order for Simplifier.net to render the complete structure. Note that if you publish a profile to Simplifier from within Forge, the application automatically handles this. Once the new release of the FHIR .NET API is ready and stable, we will update Simplifier.net and integrate the new snapshot generator. From then on, Simplifier will be able to accept profiles with only a differential component and automatically (re-)generate the snapshot. Our team is also working on implementing a dynamic differential rendering in Simplifier, so the user can activate a filtered view of only the differential profile constraints.

Differential generation

We’ve seen how snapshot generation transforms a sparse tree of constraints into a complete representation of the target structure. This functionality is now provided by the FHIR .NET API library, in the form of an atomic operation (cf. pipeline). Of course, there also exists an inverse transformation that generates a sparse differential component from a complete snapshot representation of a structure. The differential representation only contains nodes that are constrained with respect to the base resource. Unconstrained nodes are excluded from the differential.

Differential Structure

Forge generates the differential representation on-the-fly and keeps it in sync with the profile while it is being edited. This approach is completely different from the snapshot generation. Each element definition and property in a user profile is associated with the corresponding node in the underlying base profile. This allows Forge to determine which profile nodes are actually constrained (not empty and not equal to base). To no surprise, the process to find the corresponding nodes in the base profile turns out to be quite involved, esp. for slicing.

Originally, up until version 13.2, Forge contained custom logic to resolve corresponding nodes in the base profile. The original logic was limited to handling only fairly simple profiles, esp. core datatype and resource definitions and was not very efficient on memory usage. As the application developed further, we were slowly stretching the limits of the original algorithm. Users started reporting unpredictable behavior (i.e. differential was not always properly synchronized) that was hard/impossible to fix. Eventually this was no longer maintainable. So we decided to also completely rewrite the differential generation logic.

Fortunately, the new snapshot generator logic in the FHIR .NET API library already needs to resolve corresponding nodes in the base profile anyway (because each differential constraint is merged onto the corresponding base node). So Forge does no longer have to perform that same complex exercise itself. Instead, Forge now receives all the base profile associations as a useful by-product of the snapshot generator. This is quite efficient and increases the loading speed of a profile. Also we no longer need to maintain and harmonize separate but similar implementations in Forge and the .NET API library, as all common and reusable logic has now been moved into the API.

Since each node in the loaded profile is now associated with the corresponding node in the base profile, it is now trivial to determine which nodes are actually constrained with respect to the base resource. Only constrained nodes are actually included in the differential. Unconstrained element properties (either empty or equal to base) are set to null in the underlying PoCo. Also complex nodes and element definitions having all child properties equal to null are cleared in the PoCo and their state is automatically re-evaluated whenever a child node has changed. This way, all unconstrained nodes are effectively excluded from the differential component. The whole process is quite efficient as after a change, only the affected nodes are re-evaluated.

Generate Differential

The serialized xml representation is (re-)generated in the background and rendered whenever the user activates the Xml tab. This ensures that the displayed xml is always up to date without impeding the application performance. The Forge UI only displays the differential, by design, in order to save memory. By default, profiles are also saved to disk with only the differential component. The Options menu provides a configuration setting to toggle the serialization of snapshots components when saving profiles to disk. Needles to say that profiles with snapshots are significantly larger, therefore in Forge the snapshot option is disabled by default.

Conclusion

This article is way too long… hats off if you managed to read and process all this information. And I still feel like I just scratched the surface. Time permitting, maybe I’ll author another article about the specifics of the snapshot generator implementation, for those few FHIR die hards that care (the usual suspects).

Sparse Tree

Looking back to 2016, Firely has invested a lot of time and effort in (re-)designing and (re-)implementing the fundamental infrastructural components of a FHIR ecosystem, thereby laying the groundwork for a plethora of useful productivity-enhancing features that we can now start building and enhancing in 2017. I must confess that I really enjoyed working on some of the hard core abstract logic. But the nerdy stuff can also be quite demanding, time consuming and eventually goes at the expense of basic social behavior and interpersonal relations… So I’ve learned a long time ago that, even though I take joy in higher abstract thinking, I shouldn’t bury myself in it for all too long, as there is a personal price to pay. Now the Forge user interface desperately needs some love – as well as my estranged girlfriend, who’s totally fed up with me rambling on and on about snapshots and such. So I am eagerly looking forward to next year, when I can focus again on improving usability, workflow and my personal love life. But first let’s celebrate the holidays, decorate a tree, spend some quality time with family and friends and hopefully regain some of those long lost social skills.