Change Control Package

Overview

As described in the Architecture Overview document, formal version control and change management are used in openEHR to support the construction of EHR and other repositories requiring the properties of consistency, indelibility, traceability and distributed sharing. The change_control package supplies the formal specification of these features in openEHR.

common.change_control Package illustrates the openEHR model of a Versioned object, and its constituent Versions. In this model, an instance of the class VERSIONED_OBJECT<T> provides the versioning facilities for one versioned item and is often referred to as a 'version container'. Although any kind of data can be versioned according to the model presented here, the use of versioning in openEHR is limited to 'toplevel structures', such as EHR Compositions and Party objects in a demographic system.

Figure 1. common.change_control Package

Version control structures illustrates a single VERSIONED_OBJECT containing a number of VERSIONs. Although the figure implies physical containment of Versions by a Versioned object, this is only one possible implementation. Other implementations (e.g. using orthodox relational structures) might use references, separate compressed copies, or any other mechanism.

Figure 2. Version control structures

Basic Semantics

Typing

The classes VERSIONED_OBJECT<T>, VERSION<T>, ORIGINAL_VERSION<T> and IMPORTED_VERSION<T> are generic classes, with the generic parameter type T being the type of the data. This ensures that all versions in a given VERSIONED_OBJECT are of the same type, such as COMPOSITION, FOLDER, or PARTY and that the version container itself is properly typed.

Versioned Objects

Each VERSIONED_OBJECT has a unique identifier recorded in the uid attribute (a HIER_OBJECT_ID typically containing a GUID), and a reference to the owning object (e.g. the owning EHR) in the owner_id attribute (this is typically also a GUID). The latter helps ensure that in storage systems, Versioned objects are always correctly allocated to their enclosing repository, such as an EHR.

The data in a VERSIONED_OBJECT are in the form of a collection of instances of the two VERSION<T> subtypes, and are available only via the functional interface of VERSIONED_OBJECT. How the representation of this collection is implemented inside the VERSIONED_OBJECT is not defined by this specification, only the form of any given version is. Implementations of VERSIONED_OBJECT might range from the simple (all versions stored as full copies in a list) to a sophisticated compressed versioning approach as used in software file version control and some object databases. (The persistent data format of implementations of VERSIONED_OBJECT developed by different organisations will in general be incompatible. For purposes of sharing, an interoperable expression of VERSIONED_OBJECT is defined by the X_VERSIONED_OBJECT class in the EHR Extract IM.)

Version and its Subtypes

Within a Versioned object, each version is an instance of a subtype of the class VERSION<T>. The abstract VERSION class defines the generic notion of a version containing some data, that has been committed to the repository as a member of a Contribution. Accordingly, it records the Contribution in the contribution attribute and the audit in commit_audit. A Version also knows its position in the version tree within the container. It has a version identifier, uid, and knows on which version in the tree it was based (i.e. what version was checked out to create the current version), preceding_version_id (Void if it is the first version). Both of these identifiers are globally unique (see support.identification package). These properties are abstract in the VERSION class, since they are defined as being stored or computed respectively in its subtypes.

All Versions in a given version container have a uid that includes the uid of the container; in other words, the uid of a Version is its container’s uid plus further version identification for that particular version with respect to others in the same container. The VERSION.owner_id function extracts the uid field of the owning VERSIONED_OBJECT from the uid of the VERSION.

The VERSION class has two subtypes. The first, ORIGINAL_VERSION<T>, represents a Version created with original content (stored form of data property) at the time of creation (including from nonopenEHR local feeder systems), and potentially attested (signed). It includes as attributes the current version (uid) and the preceding version (preceding_version_uid). It also knows the lifecycle state of its content. If it was the result of a merge (see Version Merging) of versions other than the preceding version, the identifiers of these versions will be recorded in the attribute other_input_version_uids. All instances of VERSION<T> in non-distributed openEHR systems will be instances of ORIGINAL_VERSION<T>. The ORIGINAL_VERSION is also the unit of copying in a distributed environment.

The second subtype is IMPORTED_VERSION<T>, and acts as a wrapper of an ORIGINAL_VERSION<T>. It has its own contribution and commit_audit (inherited from VERSION<T>), and contains the original version being imported in its item attribute. Its uid and preceding_version are defined as functions, returning the corresponding attribute values from the wrapped ORIGINAL_VERSION object (in other words, an IMPORTED_VERSION does not have its own version identifier distinct from the version it is wrapping). The semantics of importing are described below in Copying. Instance view of versioned data illustrates typical arrangements of ORIGINAL_VERSION and IMPORTED_VERSION objects within VERSIONED_OBJECTs, in turn within an EHR (if this is an EHR system), ultimately within an identified system. The two VERSIONED_OBJECTs are shown representing "medications" and "problem list", to give some idea of correspondence of versioning structures to logical data. Star icons represent digital signatures.

Figure 3. Instance view of versioned data

The "Virtual Version Tree"

An underlying design concept of the versioning model defined here is known as the 'virtual version tree'. The idea is simple in the abstract. Information is committed to a repository (such as an EHR) in lumps, each lump being the 'data' of one Version. Each Version has its place within a version tree, which in turn is maintained inside a Versioned object. The virtual version tree concept means that any given Versioned object may have numerous copies in various systems, and that the creation of versions in each is done in such a way that all versions so created are in fact compatible with the 'virtual' version tree resulting from the superimposition of the version trees of all copies. This is achieved using simple rules for version identification, described below, and is done to facilitate data sharing. Two very common scenarios are served by the virtual version tree concept:

longitudinal data that stands as a proxy for the state or situation of the patient such as "Medications" or "Problem list" (persistent Compositions in openEHR) is created and maintained in one or more care delivery organisations, and shared across a larger number of organisations;
some EHRs in an EHR server in one location are mirrored into one or more other EHR servers (e.g. at care providers where the relevant patients are also treated); the mirroring process requires asynchronous synchronisation between servers to work seamlessly, regardless of the location, time, or author of any data created.

The uid attribute of the class VERSIONED_OBJECT<T> is in fact the Uid of the virtual version tree for a given logical item (such as the "problem list" of a certain patient) - that is to say, the uid will be the same in all copies of the same Versioned object in a distributed system.

The versioning scheme used in openEHR guarantees that no matter where data are created or copied, there are no inconsistencies due to sharing, and that logical copies are explicitly represented. This is achieved by the design of Version identifiers.

Contributions

Since a versioned repository (i.e. a collection of VERSIONED_OBJECTs) is by definition indelible, all logical changes including deletions, additions, modifications (including error corrections and content changes), importing and attestations of existing items, are achieved by physically committing new Versions, or for attestations, new Attestation objects to existing Versions. Each logical type of change is achieved as follows:

addition of new item: a new VERSIONED_OBJECT is created with a first ORIGINAL_VERSION whose data is the new item; the ORIGINAL_VERSION.commit_audit.change_type is set to the code 249|creation|;
deletion of existing item: a new ORIGINAL_VERSION whose data attribute is set to Void is added to an existing VERSIONED_OBJECT; the ORIGINAL_VERSION.commit_audit.change_type is set to the code 523|deleted|;
modification of existing item: a new ORIGINAL_VERSION whose data contains the updated form of the item content is added to an existing VERSIONED_OBJECT;
- if the change is logically a correction (e.g. of wrongly entered data), the ORIGINAL_VERSION.commit_audit.change_type is set to the code 250|amendment|;
- if the change is logically a change, addition etc to the content, the ORIGINAL_VERSION.commit_audit.change_type is set to the code 251|modification|;
import of item: a new IMPORTED_VERSION is created, incorporating the received ORIGINAL_VERSION; the IMPORTED_VERSION.commit_audit.change_type is set to the code for 249|creation|.
attestation of item: a new ATTESTATION is added to the attestations list of an existing ORIGINAL_VERSION; the ATTESTATION.commit_audit.change_type is set to the code 666|attestation|.

In a typical application situation, one or more of the above changes may be committed to a repository as a Contribution. For example during a patient encounter, the following might occur:

addition: a new Composition is created recording the Observations (e.g. physical examination), etc that are made during the Encounter;
modification: the Composition containing the current medications list is updated, due to a prescription being given during the encounter.

These two changes together constitute a logical change-set, and would typically be included in the one Contribution. In general, there might be any combination of the logical change types in a single commit by an application, corresponding to a single real-world business event, such as a GP Encounter, although attestations, deletions and corrections will usually be the only change within a Contribution. In every case, regardless of the combination, a CONTRIBUTION object will be created, listing the affected VERSION objects, and including its own audit object, whose change_type attribute captures the aggregate of the changes in the Compositions making up its versions. This may sometimes be approximate, and is not expected to be used as a computable value. Typical values for CONTRIBUTION.audit.change_type:

251|modification|: this accommodates cases where there is a mixture of creation, deletion, modification that constitute a change of content;
250|amendment|: corresponds to a mixture of amendments and deletions that logically constitute a correction to the content;
666|attestation|: used when the only changes are attestation of one or more of the member versions;
any code: when all member versions have the same change type, that change type may be used for the Contribution as well.

The list of all Contribution objects for a version repository (such as an EHR) provides a complete history of the change-sets made to the repository and is the basis for performing 'rollback' to access previous informational states of the EHR. Conversely, each Version object contains a reference to the Contribution that caused it to be created.

Committal and Audits

Audits are recorded in the form of instances of the class AUDIT_DETAILS (common.generic package), which defines a set of attributes which form an audit trail, namely system_id, committer, time_committed, change_type, and description or its subtype ATTESTATION, which adds a number of other attributes (see below). When an ORIGINAL_VERSION instance is created locally, the commit_audit attribute contains an audit object recording the local act of committal. However, if the Version being committed does not correspond to local data creation, but instead contains a copy of an ORIGINAL_VERSION originally created and commited elsewhere, it is committed locally as an instance of the IMPORTED_VERSION class. Both the contribution and commit_audit of the latter object correspond to the local act of committal, while the knowledge of the original Contribution and committal are retained inside the wrapped ORIGINAL_VERSION instance. Original versions can be copied any number of times; in each system into which they are imported, an IMPORTED_VERSION is created as a wrapper.

This simple scheme ensures that the audit from initial creation - which is the clinically meaningful audit - is preserved no matter how many times the Version is copied to other systems; it also ensures that from the point of view of the version container, the local commit audit and Contribution always correspond to the local act of committal.

The CONTRIBUTION class also contains an audit attribute. Whenever a CONTRIBUTION is committed, this attribute captures to the time, place and committer of the committal act; these three attributes (system_id, committer, time_committed of AUDIT_DETAILS) should be copied into the corresponding attributes of the commit_audit of each VERSION included in the CONTRIBUTION. This is done to enable sharing of versioned entities independently of which Contributions they were part of.

The time_committed attribute in both the Contribution and Version audits should reflect the time of committal to an EHR server, i.e. the time of availability to other users in the same system. It should therefore be computed on the server in implementations where the data are created in a separate client context.

In terms of database management, Contributions are similar to nested transactions. An attempt to commit a Contribution should only succeed if each Version and/or Attestation in the Contribution is committed successfully.

Digital Signature

At the time of committal of a Version, a digital signature of the object can be made. In this process, a Version object (an ORIGINAL_VERSION or IMPORTED_VERSION) is serialised into canonical form which is then hashed to produce a digest. If public key or equivalent infrastructure is in place so that users are able to sign content, a digital signature can be created from the hash, using the user’s private key. Either way, the result is then radix-64 encoded to create an ASCII string so as to remove or reduce potential problems with subsequent communication. The openPGP standard ensures that the trasformations and algorithms used to create the signature are indicated within it.

The signature can serve two purposes. If only the hashing step is done, the digest acts as a data integrity check, indicating if the data have been tampered with after creation. If the signing step is carried out, it authenticates the user as the author of the content to readers of the content. In a versioned EHR system, it also acts as a non-repudiation measure, since the signature is stored permanently with the data. To circumvent hacking of the data, public notarisation of the signature can be used. The signature, if present, is generated according to the openPGP standard (IETF RFC 4880), following the process shown below.

Figure 4. Version signature (using openPGP)

The serialisation process works by the simple rule of serialising the entire Version object (note that the signature attribute will be Void at this point) into an agreed XML, ODIN or other text format, then applying the subsequent transformations to the serialised data, then writing the digest result back into the signature attribute. If the object to be serialised is an IMPORTED_VERSION, the process is the same - all attributes of the object are serialised and then used to generate a signature. The result will be that the IMPORTED_VERSION instance will carry its own signature which signifies the act of importing and making available locally an ORIGINAL_VERSION from another system.

To Be Determined: The exact serialisation is not yet defined by openEHR, but ODIN might be preferred since it has an unambiguous encoding of object structures, whereas different XML libraries can generate different XML from the same objects.

It should be noted that the signing process here creates a signature of a logical form of the content, not a particular graphical or other directly human interpretable view. Usually the relationship between the data and what is seen on the screen is assumed to be 1:1 in a reliable system. If however the equivalent of a signature of a screen image or other literal form of the data are needed, then the Attestation form of the commit_audit is needed. This is described below.

One of the most important uses of signatures in openEHR data is likely to be within EHR Extracts, since they can provide an assurance authenticity and integrity of the data to a receiver who has no knowledge of the quality of the processes used in the originating system.

Attestation

The ORIGINAL_VERSION.attestations attribute allows attestations to be associated with the data in an original version. Attestations are treated in openEHR as a kind of audit with additional attributes, and are described in detail in the common.generic package section of this specification. Any number of attestations to be associated with each Version in a Versioned object. Attestations can be added at any time after committal of the content being attested. They can be used as required by enter prise processes or legislation, and indicate by whom and when the item in question was attested. A digital "proof" is also required, although no assumption is made about the form of such proof.

Attestations may be used in different ways as follows.

Signing content at committal: for some reason, the information being committed needs to be digitally signed. It may be that sensitive information is to be added to the EHR, e.g. recording the fact of sectioning of a patient under the mental health act, diagnosis of a fatal disease etc, or simply something which the user wants to sign. In this case, ORIGINAL_VERSION.commit_audit is of type ATTESTATION rather than AUDIT_DETAILS.
Marking content for review and signing: data entered and committed by a data-entry person e.g. a secretary, transcriptionist or student need to be reviewed and signed by a senior clinician. Similarly to the above case, this will cause ORIGINAL_VERSION.commit_audit to be of type ATTESTATION, but in this case, the Attestation will have its is_pending flag set True to indicate that attestation is required.
Post-committal signing: data committed with an Attestation in the is_pending state is reviewed and signed at a later point in tme by an appropriate member of staff. This action will cause an ATTESTATION to be added to the ORIGINAL_VERSION.attestations list.

Normally, Attestations refer to the entire version to which they are attached. However, it is possible for an ATTESTATION instance to refer to some finer-grained item within the data of the version, such as a single ENTRY within a COMPOSITION.

When subsequent Versions are added, the existing Attestations can not be assumed to be valid for the new Version, since the nature of an attestation is that it records the witnessing of exactly the content displayed at the time of witnessing.

Versioning Semantics

Version Lifecycle

Content in Original versions has a lifecycle state associated with it, modelled using the ORIGINAL_VERSION.lifecycle_state attribute, which is coded from the openEHR Terminology 'version lifecycle state' group. The possible values are 532|complete|, 553|incomplete| and 523|deleted|. Usually content will be committed in the complete state. However, in some circumstances, e.g. because the author has run out of time or due to an emergency, it may be committed as incomplete meaning that it is either incomplete or at least unreviewed. In hospitals this is a common occurrence. Unfinished Compositions cannot be saved locally on the client machine, since this represents a security risk (a small client-side database would be much easier to hack into than a secure server). They must therefore be persisted on the server, either in the actual EHR, or in a 'holding bay' which was recognised as not being part of the EHR proper. Either way, the author would have to explicitly retrieve the Composition(s) and after further work or review, 'promote' them into the EHR as 'active' Compositions; alternatively, they might decide to throw them away.

Going from incomplete to complete states almost always corresponds to a change in content, and corresponds to a new VERSION regardless. This modelling approach allows such content to exist on the EHR system, but to be flagged as incomplete when viewed by a user.

Systems are responsible for implementing checks to find 'old' Versions in incomplete state, and bring them to the relevant user’s notice, or automatically deleting them or progressing them to complete as appropriate.

The following diagram shows the formal state machine of the ORIGINAL_VERSION.lifecycle_state attribute.

Figure 5. ORIGINAL_VERSION lifecycle_state state machine

Logical Deletion

Within the lifecycle described above, deletion of existing top-level content items (i.e. the entire data contents of a Version) is somewhat of a special case in openEHR and in EHRs in general. Medicolegal and traceability requirements mean that information cannot be literally removed, since it must always be possible to revert back to a previous state of the record in which the deleted information is intact. Accordingly, information can only ever be logically deleted. This is achieved by the following procedure in the Version container in question:

create a new Version in the normal way;
delete its data (which will by default be a copy of the data of the previous Version);
set the lifecycle_state value to the code for "deleted"
commit in the normal way.

Logical deletion can be used for various reasons, including patient direction to remove material, and in the situation where information about a different patient has been incorrectly committed to a record, and has to be removed.

Version Identification

The version identification scheme described here is adapted from the work of Hnìtynka and Plášil [Hnìtynka_2004]. VERSION objects are identified by a uid attribute, which is a three-part identifier consisting of the attributes object_id, creating_system_id and version_tree_id (see support.identification package in the Support IM), in which the object_id part is a copy of the uid of the owning VERSIONED_OBJECT version container.

The following figure illustrates the scheme. The VERSIONED_OBJECT uid value - here, "1234" - is used as the object_id() part of the uid of every contained VERSION (note that an OBJECT_VERSION_ID’s value is a String, with various accessor functions like object_id() used to extract the logical pieces). Accordingly, each of those versions will have a uid following the pattern "1234:system_id:version_tree_id". The use of the version_tree_id and system_id parts of the identifier are explained in the sections below. The function VERSION.owner_id() is provided to enable a caller to easily obtain the 'owning version container' identifier.

Figure 6. Version identification system

The following figure provides an example of multiple VERSIONs within a VERSIONED_OBJECT, where one of the versions has been merged from another system. This highlights the identifiers; details of original and merged versions are described below.

Figure 7. Version identification Example

Local Versioning

The version_tree_id attribute of VERSION.uid identifies a version of an item with respect to other versions in the same tree. The requirements of the identifier are the same as for typical versioning systems in use in software configuration management, and are as follows:

to encode the relationship between versions in the version id, that is to say, version identifiers are constructed such that given a series of identifiers, the relative positions in the tree can be determined;
to allow for branches, so that variants of a particular node can be created; e.g. due to translation, or for training purposes.

A suitable scheme satisfying the above requirements for health information is the simplest possible, i.e. a single number representing the version. Version identifiers thus start at 1 and continue by single increments. The succession of version identifiers formed by changes over time is known as the "trunk" of the version tree.

To support branching, a further pair of numbers is added. The first number identifies the branch (e.g. the 1st branch, 2nd branch etc from that trunk node), while the second identifies the version. Both of these numbers also start at '1'. The result of this is that version numbers like '1.1.1' (first version of first branch from trunk node 1), '2.3.3' (3rd version of 3rd branch from trunk node 2) are possible. Inside openEHR systems where sharing with other systems does not occur, it is expected that branched versioning will be used rarely; translation is likely to be the only reason (for example if a Portuguese translation of an English language version of a Composition is made).

Distributed Versioning

However, in a distributed environment where copying and subsequent modification can occur, there are more requirements of the version identification scheme, as follows:

it must be possible for an item to be copied and for local modifications then to be made without causing version clashes;
it must be possible to send more recent versions from the original system to a target system that has already received earlier versions, and for these versions to be distinguishable from versions in the receiving system, including the previously imported versions - this enables the receiving system to know how and where to commit the received versions;
it must be guaranteed that any version of any object is uniquely identified globally, no matter whether it is a locally created trunk version, a locally created branch version or a version containing changes made to a copied version.

To satisfy these needs, two modifications are made to the identification scheme. The first is the addition of the creating_system_id attribute of VERSION.uid, representing the system where the version was created. This is a machine processable identifier, such as a reverse internet address or GUID.

Whenever a new ORIGINAL_VERSION in a particular VERSIONED_OBJECT (with a particular uid) is created locally, the VERSION.uid.creating_system_id is set to the identifier of the local system; if the version was imported, creating_system_id will already have been set to the identifier of the system of original creation.

The second modification is to require branching version identifiers to be used when local modifications are made to versions copied from elsewhere; this ensures that the modifications now being made in the target system are considered in a global sense as logical branches or variants rather than trunk versions which are made in the originating system. It also allows later trunk versions from the originating system to be copied at some future time to the target system without version identifier clashes. In summary, this scheme uses the tuple {object_id, version_tree_id, creating_system_id} to globally uniquely identify any openEHR VERSION object.

Semantics in Distributed Systems

Copying

The Copy Operation

In openEHR, the smallest unit of copying of content between systems that satisfies traceability requirements is the ORIGINAL_VERSION. In order to copy a OBSERVATION or even an COMPOSITION somewhere else and retain versioning capability, its enclosing ORIGINAL_VERSION object must be sent. When the type of content is a COMPOSITION for example, an ORIGINAL_VERSION<COMPOSITION> object is sent. At the receiving system various steps will occur depending on whether:

any items for the EHR in question have ever been copied before;
a copied EHR exists in the destination system for the subject of care, but no copies of the particular item in question have even been made (e.g. it is the first time Family History has been copied);
an EHR exists, and previous copies have been made for the item in question;
there is a duplicate EHR for the subject of care (i.e. created by new data entry rather than by automatic copying).

In the first situation, there is not even an EHR (i.e. repository of Versioned objects for the patient in question) in the target system. A new one has to be created. As mentioned in the EHR IM specification, the newly created EHR should re-use the EHR identifier from the source system. This establishes the new EHR as an intentional clone of the source EHR (or more correctly, part of the family of EHRs making up the virtual EHR for that patient).

If it is the first time any version of the item logically identified by its ORIGINAL_VERSION.uid.object_id (i.e. the uid of its original VERSIONED_OBJECT, common to all Versions in the same container) was received from the originating system, a new VERSIONED_OBJECT<T> (e.g. VERSIONED_OBJECT<COMPOSITION>) is created, with its uid set to the same value as the received VERSION.uid.object_id. This establishes the newly created VERSIONED_OBJECT as being a logical clone of the one from which the received ORIGINAL_VERSION was copied. If some version of the item had already been received, this step will have already occurred, and the requisite VERSIONED_OBJECT would already exist.

An IMPORTED_VERSION instance is then created, its item set to the received ORIGINAL_VERSION, and it is committed in the normal way (i.e. as part of a Contribution). The IMPORTED_VERSION commit_audit and contribution attributes record the local act of committal. In this operation, the ORIGINAL_VERSION instance is never modified - it remains a faithful copy of its original, no matter how many systems it may be copied through.

Subsequent Local Modifications

In most cases, the received information will remain as is for the duration. However, in some cases, users at the receiver system might want to make modifications as well. This is likely to happen in the case of information items representing things like medication lists and allergies. When new versions are added locally to a copied object, the local system id is recorded in the uid.creating_system_id attribute, while branching numbering is used in the uid.version_tree_id.

These copying scenarios are illustrated in the figure below. On the left hand-side of the figure, a version container (i.e. an instance of VERSIONED_OBJECT) with uid = "1" is shown; the first Version has uid.creating_system_id = "sysA"; uid.version_tree_id = "1". Further local trunk and branch versions are also shown.

Figure 8. Distributed versioning

When the first ORIGINAL_VERSION is copied (copy #1) to system B, it is committed as an IMPORTED_VERSION to a VERSIONED_OBJECT which is a clone of the original. Subsequent copies (copy #2 and copy #3) can be made of later versions from system A to system B, with the effect that the version tree can be recreated inside system B (if required; there is of course no obligation to do anything with the received information). Users in system B an also make modifications to the received Version copies; these modifications are shown in grey, as branched versions with uid.creating_system_id = "sysB". Independently, users in system B will of course be creating other content locally, e.g. as shown on the right-hand side, where a Versioned object with uid= "2" has been created. Two places are indicated on the diagram where identification clashes could have occurred, but are prevented due to the use of the 3-part unique Version identifier scheme.

Two rules are required to make this system work, as follows:

branch versions from the original systems that are copied to another system cannot be copied without their corresponding preceding versions on the same branch (if any) and trunk versions also being copied;
no system should create a new Versioned object (with a new uid) without first determining that it does not already have one with the same uid. This should happen automatically if GUIDs are being used (and the generating software is reliable); checks may have to be made if ISO Oids are being used.

An important consequence of the way IMPORTED_VERSION is modelled is that in the Version containers resulting from copy operations, the commit times always reflect the local (more recent) act of committal, not the original committal of the information to the container where it was created. This ensures that a query for the state of a Version container at earlier commit times correctly returns what information existed at that time in that container, rather than giving the illusion that recently copied Versions were there earlier than the time of local committal (as would occur if the original commit time of the ORIGINAL_VERSION object was used for comparison purposes in such queries). Accordingly, such a query over an entire EHR or other versioned information repository always returns the state of the repository available to users at that time, regardless of how many later merges or copies were carried out. This is a key requirement for supporting medico-legal and historical investigations of stored information.

Version Merging

One of the most common operations in distributed versioned environments, particularly in healthcare, is that content created in one system is imported into another system, modifications are created locally there which are then sent back the first system. This information pathway corresponds to scenarios such as the patient being referred from primary care into a hospital, and later being discharged into primary (or other care).

The usual need when the first system receives changes made to the data by the second system is to merge them back into the trunk of the version tree. Logically a 'merge' is the operation of using two versions of the same content to create a third version. How the source versions are used will vary based on the semantics of the information; it could be that the either is simply taken in its totality and the other discarded, or some mixture might be created of the two in a process of editing by the user. In many cases in health, such as where the content is a medication or problem list, the user in the original system will review the received content and create a new trunk version locally using that content, since it will be deemed to be the most accurate available in the clinical computing environment. This scenario is illustrated below.

Figure 9. Version merging

In this figure, versions 1 and 2 of the content (e.g. a medication list) from Versioned object with uid=1 are copied from system A (e.g. a GP) to system B (e.g. a hospital). In system B, changes are made to version 2, creating a branch (as an instance of IMPORTED_VERSION<T>) as required by the rules described above. These changes (modified medication list) are then imported back into system A. The system A user performs a merge operation to create a new trunk version 3, using the sysB::2.1.2 and sysA::2 content; most likely, he simply reviews the two input versions and uses the sysB::2.1.2 content unchanged (the result is that system A now has an up-to-date medication list for the patient, including medications orginally recorded at system A, as well as additions recorded at system B). The new Version is an instance of ORIGINAL_VERSION<T>, with its other_input_version_ids attribute set to include the OBJECT_VERSION_ID representing sysB::2.1.2 (it does not need to include sysA::2, since this is already known in the preceding_version_uid).

If in system A a modification had been done to the sysA::2 version, creating sysA::3, in parallel with the system B changes, then a conflict situation is likely when the merge attempt is made. This may need to be resolved by a human user, for whom an automated merge attempt could be presented on the screen as a starting point, much as current source code control tools do today.

Disjoint Merging

An unintended but not uncommon situation is when distinct Version containers are created for the same real-world entity. For example, separate EHRs can be created for the one patient, due to patient identification errors or other procedural or administrative problems. Each record is likely to contain some logically duplicated basic information, as well as information unique to that record, e.g. contributed by different hospital departments. Within the one EHR, unintentionally distinct Version containers might be created for the same logical item, such as the patient’s problem list. These erroneous situations are eventually detected, and need to be rectified. Logically what is required is to merge the two records (each potentially consisting of numerous Version containers) into one, as shown below.

Figure 10. Disjoint merging

The merge procedure is as follows:

decide which record is to remain active (for merging purposes, this will be the "target", the other the "source");
for all Version containers in the source record…
- if there is a logical equivalent in the target record (for EHRs, there will typically only be equivalents for persistent and possibly administrative Compositions), perform a disjoint merge in the target Version container by:
  - creating a new trunk version in the target Version container;
- if there is no logical equivalent, do the following:
  - create a new target Version container;
  - create its first trunk Version;
- in both cases, continue as follows:
  - set the data in the new trunk Version to be a copy of the data from the most recent trunk Version from the source container;
  - set other_input_version_uids to include the uid of the source Version being merged (this uid will contain the uid of the Version container being logically deleted);
  - for any branches on the most recent trunk Version in the source container, create corresponding branches on the newly created trunk Version in the target, include the corresponding content and set the other_input_version_uids in the target in the same way as above;
  - add a new trunk Version to the source container, with the data set to Void, and lifecycle_state set to deleted.

As for copying and merging, an important consequence of this procedure is that the resulting record (i.e. the target of the merge procedure) continues to correctly represent previous states of the repository, regardless of how many recent merges have occurred.

Moving Version Containers

It will not be uncommon that whole VERSIONED_OBJECTS need to be moved to another system, e.g. due to a move of a complete patient record (due to the patient moving), or re-organisation of EHR data centres. The semantics of a move are different from those of copying: with a move, there is no longer a source instance after the operation; the destination instance becomes the primary instance.

When the move is effected, the identifier of the system in which the VERSIONED_OBJECT now exists will usually be different from what it was before. As a consequence, subsequent versions of the content created in a moved version container will now have the uid.creating_system_id set to the id of the new system. This creates another variation on the version lineage, one in which the uid.creating_system_id value can change in the trunk line, as shown in below.

Figure 11. Moving a version container

Class Descriptions

Unresolved include directive in modules/common/pages/change_control_package.adoc - include::../UML/classes/versioned_object.adoc[]

Unresolved include directive in modules/common/pages/change_control_package.adoc - include::../UML/classes/version.adoc[]

Unresolved include directive in modules/common/pages/change_control_package.adoc - include::../UML/classes/original_version.adoc[]

Unresolved include directive in modules/common/pages/change_control_package.adoc - include::../UML/classes/imported_version.adoc[]

Unresolved include directive in modules/common/pages/change_control_package.adoc - include::../UML/classes/contribution.adoc[]