Semantics of openEHR and ISO 13606 extracts

Versioning Semantics

Although for most clinical situations, it is the latest versions of Compositions which are sent to a receiver, there are requirements for various amounts of version-related information to be included, as described in Requirements on page 8. At a minimum, Compositions always include the audit trail corresponding to the particular version which the Composition represents. In some cases, historical versions of a logical Composition are needed for some medico-legal reason. It may even be required that the receiver system wants to reconstruct a complete facsimile of the versioned object, logically identical to its form at the source (but most likely stored in a different versioning implementation). The openEHR extract specification defines the simplest means of satisfying these needs, namely to include all Compositions in their whole form, including in the case where they are successive versions of a single logical Composition such as "family history", as illustrated in the figure below. The main justification for this is that no assumptions should made on sender or receiver systems to do with their ability to represent or efficiently process versions. Whole Compositions can always be processed by even the simplest systems.

Figure 1. Successive Composition versions in a logical Transaction

It is assumed that any system that wants to be able to determine things such as who was responsible for changing a certain fragment of a Composition, when some part of a Composition came into being, or the differences between two particular versions of a Composition, must have version control capability locally. This usually means having some implementation of a version control model such as the one described in the openEHR Common Reference Model, which can do efficient versioning, differencing and so on. Supplying Compositions in their full form ensures that no assumption is made on what such an implementation might be.

This approach is a departure from the ISO 13606-1:2008 EHR Extract standard, which defines Compositions so as to include revision history information on every node of the structure. Although it is not stated in the 13606 specification whether the 'Composition' is in fact supposed to be understood as a copy of a Composition from an EHR, or as a 'cumulative diff' of Composition versions in an EHR, analysis shows that only the latter can make sense because the Composition (Composition) is the unit of creation and modification, and there is logically only one audit trail for each version. Even the 100th version has associated with it only one audit trail.

This raises the question of whether a 'diff' form of Compositions should be used in the openEHR Extract, conforming to the ISO standard. The approach was not chosen for a number of reasons:

it implies that senders can generate 'diff' information structures and that receivers can process them, i.e. it makes more assumptions than necessary about the sophistication of systems;
the ISO specification appears to be in error with respect to deletions - the sending of logical deletions does not appear to be handled properly;
the sending of deletions is not normally desired, and may be illegal (e.g. in Europe there are EC directives preventing the sending of statements corrected by clinicians or patients).

It is worth contemplating just how complex cumulative difference information would be. The following figure illustrates the structure generated by the accumulation of only three changes shown in the successive versions in the figure below. The large numbers of changes likely in persistent Compositions will generate far more complex structures.

Figure 2. Generation of Cumulative Difference Form

In conclusion, while sending a difference form of Compositions is not out of the question in a future when EHR systems are routinely capable of sophisticated version handling, it is considered too complex currently, and the controls over sending deleted information have not been sufficiently well described.

Creation Semantics

The following describes an algorithm which guarantees the correct contents of an EHR extract. The input to the algorithm is:

the list of EHR Compositions required in the extract (the "primary" Composition set);
optionally a folder structure in which the Compositions are to be structured in the extract;
the include_multimedia flag indicating whether DV_MULTIMEDIA content is to be included inline or not;
the follow_links attribute indicating to what depth DV_LINK references emanating from Compositions should be followed and the Compositions containing the link targets also included in the extract.

The algorithm is as follows.

Create a new EHR_EXTRACT including the folder structure;
Create a demographics EXTRACT_CHAPTER and write the PARTYs in;
For each Composition in the original set, do:
- create an X_VERSIONED_COMPOSITION, and set is_primary;
- for each instance of OBJECT_REF encountered (e.g. PARTY_REF), obtain the target of the reference from the relevant service, and copy it to the appropriate chapter, e.g. demographics, access_groups tables with the key = the OBJECT_REF.id;
- copy/serialise the Composition into the appropriate place in the folder structure rewriting its OBJECT_REFs so that namespace = "local"
- for each instance of DV_MULTIMEDIA encountered, include or exclude the content referred to by the uri or data attributes, according to the include_multimedia flag;
- according to the value of follow_links, for each instance of DV_LINK encountered (only from/to Archetyped entities):
follow the links recursively. For each link: create an X_VERSIONED_COMPOSITION; set is_primary = False, write the path and write the target Compositions in the extract if not already there;
create the DV_LINK objects so that their paths refer correctly to the Compositions in the Extract;

TBD: do something about Access_control objects;