Quantity Package
Overview
The data_types.quantity package is illustrated below. Dates and Times are found in the
next section.
Requirements
Ordinal Values
Medicine is one domain in which symbols representing relative magnitudes are commonly used, without exact values being known. The main purpose is usually to classify patients into groups for which different decisions might be made. Thus, while approximate ranges (technically speaking - "fuzzy intervals") might be stated (such as for a urinalysis), concrete values are not of interest, only categories are. Take for example the characterisation of pain as being "mild", "medium", "severe", or the reflex response to tendon percussion as "-", "+/-", "+", "++", "+++", "++++". There may be no way to scientifically precisely quantify such values because they reflect a subjective experience of the patient or informal judgement by clinician. However, they are understood as being ordered, e.g. "++" is 'greater than' "+".
Similarly, even though the symbolic values for haemolysed blood in a urinalysis have approximate ranges stated for them, as shown below, these 'values' are not usable in the same way as true quantities.
-
"neg", "trace" (10 cells/μl)
-
"small" (<25 cells/μl)
-
"moderate" (<80 cells/μl)
-
"large" (>200 cells/μl)
A second requirement for ordinal values is that in many cases there is a need to associate integer values with the symbols, in order to facilitate ordered comparison, and also to enable longitudinal comparison across results of the same kind (e.g. pain, protein). Integer values may be negative, 0 and positive, typically to allow the 0 value to correspond to a neutral value in a range.
| an argument sometimes put forward for recording all ordinals in a more precise way is that comparisons might want to be made between the values quoted by two laboratories for the same symbol (e.g. "moderate"). There are a number of counter-arguments. Firstly, such comparisons are a poor attempt at normalisation, an activity which is the business of pathologists, not EHR users. Secondly, the symbolic values are often arrived at by the tester making a judgement of colour on a strip, which while an adequate (and cost-effective) approach for classifying, is not a valid means of quantifying a value. Lastly, in most cases, if a quantified point value or range is desired, or available, then it will be used - meaning that the appropriate quantitative data type can be used, rather than an the ordinal type. |
Countable Things
An common kind of data value in medicine is the dimensionless countable quantity, e.g. "number of doses: 2", "number of previous pregnancies: 1", "number of tablets: 3". Values of this type are always integral. Countable values need to be convertible to real numbers for statistical purposes, for example for a study of average number of pregnancies per couple.
Some countable entities such as tablets are divisible into major fractions, typically halves and occasionally quarters.
Dimensioned Quantities
The most common kind of quantity is a measured, dimensioned quantity. Anything which is measurable (rather than countable) involves a number of data aspects, namely:
-
a magnitude whose value is a real number;
-
the physical property being measured, with the appropriate units;
-
a concept of precision, i.e. to what number of decimal places the value is recorded;
-
a concept of accuracy, i.e. the known or assumed error in the measurement due to instrumentation or human judgement.
Examples of dimensioned quantities include:
-
systolic BP: 110 mmHg
-
height: 178 cm
-
rate of asthma attacks: 7 /week
-
weight loss: 2.5 kg
Ratios and Proportions
A common quantitative type in science and medicine is the proportion, or ratio, which is used in situations like the following:
-
1:128 (a titer);
-
Na:K concentration ratio (unitary denominator);
-
albumin:creatinine ratio;
-
% e.g. red cell distribution width (RDW) which is the width of a distribution of RBC widths.
In general ratios have real number values, even if many examples appear to be integer ratios. Proportions with unitary denominator and % (denominator = 100) are common.
Formulations
A concept superficially similar to proportions and ratios is formulations of materials, such as a solid in a liquid e.g.:
-
250 mg / 500 ml (solute/solvent)
Although a single solute/single solvent formulation appears to have the same form as a ratio, the general form is for any number of substances to be mixed together, usually according to a particular procedure. Formulations are therefore not candidates for direct modelling as fine-grained quantities, but instead are constructed by archetyping a higher-level structure, each leaf element of which contains the required kind of Quantity.
Quantity Ranges
Quantity ranges are ubiquitous in science and medicine, and may be defined for any kind of measured phenomenon. Examples include:
-
healthy weight range, e.g. 48kg - 60kg
-
normal range for urinalysis in pregnancy - protein, e.g. "nil" - "trace"
Reference Ranges
A reference range is a quantity range attached to a measured value, and is common for laboratory result values. The typical form of a reference range found in a pathology result indicates what is considered the 'normal' range for a measured value. Examples of reference ranges:
-
normal range for serum Na is 135 - 145 mmol/L.
-
desirable total cholesterol: < 5.5 mmol/L (strictly this probably should be 2.0 - 5.5 mmol/L, but is not usually quoted this way as low cholesterol is not considered a problem.)
Ranges can also be quoted for drug administrations, in which case they are usually thought of as the 'therapeutic' range. For example, the anticonvulsant drug Carbamazepine has a therapeutic range of 20 - 40 μMol/L. In some cases, there are multiple ranges associated with a drug, for example, Salicylate has a therapeutic range of 1.0 - 2.5 mmol/L and a toxic range > 3.6 mmol/L
Various examples occur in which multiple ranges may be stated, including the following.
-
The administration recomendations for drugs which depend on the particular patient state. For example, the therapeutic range of Cyclosporin (an immunosuppresant) is a function of time post-transplant for the affected organ, e.g. kidney: < 6 months: 250 - 350 μg/L, > 6 months: 100 - 200 μg/L.
-
Normal ranges for blood IgG, IgA, IgM which vary significantly with the age in months from birth.
-
Progesterone and pituitary hormones have ranges which are different for different phases of the menstrual cycle and for menopause. This may result in 4 or 5 ranges given for one result. Only one will apply to any particular patient - but the exact phase of the cycle may be unknown - so the ranges may need to be associated with the value with no 'normal' range.
Where there are multiple ranges, the important question is: which range information is relevant to the actual data being recorded for the patient? In theory, only the range corresponding to the particular patient situation should be used, i.e. the range which applies after taking into account sex, age, smoking status, "professional athlete", organ transplanted, etc. In most cases, this is a single "normal" range, or a pair of ranges, typically "therapeutic" and "critical". However, practical factors complicate things. Firstly, data is sometimes supplied from pathology labs along with some or all of the applicable reference ranges, even though only some could possibly apply. This is particularly the case if the laboratory has no other data on the patient, and cannot evaluate which range applies. The requirement for faithfulness of recording might be extended to reference data supplied by laboratories, regardless of how irrelevant or arbitrarily chosen the reference data is, meaning that such data has to be stored in the record anyway. Secondly, there may be circumstances in which physicians want a number of reference ranges, even while knowing that only one range is applicable to the datum. Ranges above and below the relevant one might be useful to a physician wishing to determine how far out of range the datum is.
Normal Range and Status in Laboratory data
It is quite common for laboratories to include a normal range with each measured value, and/or a normal 'status', which indicates where the value lies with respect to the normal range. The latter will commonly take the form of markers like "HHH" (critically high), HH (abnormally high), H (borderline high), L, LL, LLL in HL7v2 messaging, although other schemes are undoubtedly used.
Design
Basic Semantics
In order to make sense of the requirements in a systematic way, a proper typology for quantities is
needed. The most basic characteristic of all values typically called 'quantities' is that they are
ordered, meaning that the operator "<" (less-than) is defined between any two values in the domain.
An ancestor class for all quantities called DV_ORDERED is accordingly defined. This type is subtyped
into ordinals and true quantities, represented by the classes DV_ORDINAL and DV_QUANTIFIED
respectively. DV_ORDINAL represents data values whose exact numeric values are not known, and
which use symbolic renderings instead, such as "+", "++", "+++", or "mild", "medium", "severe".
Each symbol can be assigned any integer value, providing a basis for computable comparison. In contrast,
instances of DV_QUANTIFIED and all its subtypes have precise numeric magnitudes.
DV_QUANTIFIED itself introduces the concepts of magnitude and magnitude_status. The magnitude
attribute is guaranteed to be available on any DV_QUANTIFIED, carrying the effective value, regardless
of the particular subtype. The optional magnitude_status attribute can be used to provide a nonquantified
indication of accuracy, and takes the following values:
-
"=" : magnitude is a point value
-
"<" : value is < magnitude
-
">" : value is > magnitude
-
"<=" : value is <= magnitude
-
">=" : value is >= magnitude
-
"~" : value is approximately magnitude
If not present, meaning is "=".
Logically, an accuracy attribute should also be included in DV_QUANTIFIED, but as its modelling is
different in the subtypes in a way that does not easily lend itself to a common ancestor, it is only
included in the subtypes.
The DV_QUANTIFIED class has two subtypes: DV_AMOUNT and DV_ABSOLUTE_QUANTITY. The
former corresponds to relative 'amounts' of something, either a physical property(such as mass) or
items (e.g. cigarettes). Mathematically, the '+' and '-' operators (as well as '*' and '/') are defined in
the same way as for the real numbers (or any other mathematical 'field'), with the semantics that adding
two relative quantities measuring the same thing (i.e. with the same units) produces another relative
quantity of the same kind; while the semantics of subtraction are that one relative quantity
subtracted from another generates a third.
The second subtype of DV_QUANTIFIED, DV_ABSOLUTE_QUANTITY, models quantities whose values
are absolute along a line having a defined origin. The main example of absolute quantities are the
temporal concepts date, time and date/time. These are distinguished from relative quantities in that
the normal addition and subtraction operations don’t apply. Instead, the semantics of such operators
are based on the idea of the difference between absolute values being a relative amount. For example,
two dates can be subtracted, but the result is a duration, not another date. For this reason, the operations
add, subtract and diff are defined rather than '+' or '-'. Date/time types, as well as the relative
concept duration, are defined in [Date Time Package].
Subtypes of DV_AMOUNT are DV_PROPORTION, DV_QUANTITY, DV_COUNT, and DV_DURATION (see
date_time package). The type DV_COUNT has an integer magnitude and is used to record naturally
countable things such as number of previous pregnancies, number of steps taken by a recovering
stroke victim and so on. There are no units or precision in a DV_COUNT. Countable quantities can be
used to create instances of DV_QUANTITY, such as during a statistical study which average tobacco
consumption over a time period. Such a computation might cause the creation of DV_QUANTITY
objects representing values like {magnitude = 5.85, units = '/ week'}
DV_QUANTITY is used to represent amounts of measurable things, and has a real number magnitude,
precision, units and accuracy. The units attribute contains the scientific unit in a parsable form
defined by the Unified Code for Units of Measure (UCUM). A valid units string always implies a
measured property, such as "force" or "pressure". The property of a Quantity can conveniently constrained in archetypes, e.g. to "pressure", which would allow any pressure unit. Unit strings can be
compared to determine if they measure the same property (e.g. "bar" and "kPa" are both units corresponding
to the property "pressure"), which enables the is_strictly_comparable_to function defined
on DV_ORDERED to be properly specified on DV_QUANTITY.
| while these semantics will allow comparison of e.g. two pressures recorded in mbar and mmHg, or even two accelerations whose units are "m.s^-2" and "m/s^2", they provide no guarantee that this is a sensible thing to do in terms of domain semantics: comparing a blood pressure to an atmospheric pressure for example may or may not make any sense. It is not within the scope of the quantity package to express such semantics: this is up to application software which uses Quantities found in specific places in the data. |
Accuracy and Uncertainty
Theoretically it might be argued that 'accuracy' should not be included in a model for quantified values,
because it is an artifact of a measuring process and/or device, not of a quantity itself. For example,
a weight of "82 kg ±5%" can be represented in two parts. The "82 kg" is represented as a DV_QUANTITY, while the "±5%" could be included in the protocol description of the weighing
instrument, since this is where the error comes from. For practical purposes however, (in)accuracy in
a measured quantity corresponds to a range of possible values. In realistic computing in health, it is
quite likely that the accuracy will be required in computations on quantities, especially for statistical
population queries in which measurement error must be disambiguated from true correlation.
Accuracy is therefore introduced as the abstract feature accuracy of the DV_QUANTIFIED class. It is
defined concretely in the two descendants, DV_AMOUNT, where it is of type Real, and
DV_ABSOLUTE_QUANTITY, where it is of a differential type defined by subtypes. A value of 0 in
either case indicates 100% accuracy, i.e. no error in measurement. Where accuracy is not recorded in
a quantity, it is represented by a special value. In DV_AMOUNT, a value of -1 for the accuracy attribute
is used for this purpose, and the constant unknown_accuracy_value = -1 is provided within the class
to give a symbolic name for the special value. In the DV_ABSOLUTE_QUANTITY class,
accuracy_unknown is represented by a Void (i.e. null) value for the accuracy attribute. An abstract
Boolean feature accuracy_unknown is defined in the parent class DV_QUANTIFIED to provide a logical
test of accuracy being absent, and is implemented in the respective descendants by concrete functions
that check for the special values.
In addition, the class DV_AMOUNT, provides a feature accuracy_is_percent: Boolean to indicate if
accuracy value is to be understood as a percentage, or an absolute value.
When two compatible quantities are added or subtracted using the + or - operators (DV_AMOUNT
descendants) or add and substract (DV_ABSOLUTE_QUANTITY class), accuracy behaves in the following
way:
-
if accuracies are present in both quantities, they are added in the result, for both addition and subtraction operations;
-
if either or both quantities has an unknown accuracy, the accuracy of the result is also unknown;
-
if two
DV_AMOUNTdescendants are added or subtracted, and only one hasaccuracy_is_percent= True, accuracy is expressed in the result in the form used in the larger of the two quantities.
The related notion of 'uncertainty' is understood as a subjective judgement made by the clinician, indicating that he/she is not certain of a particular statement. It is not the same as accuracy: uncertainty may apply to non-quantified values, such as subjective statements, and it is not an aspect of objective measurement processes, but of human confidence. Where the uncertainty is due to subjective memory e.g. "I think my grandfather was 56 when he died", the uncertainty is simply recorded as another value, along with the main data item being recorded. Uncertainty is therefore not directly modelled in the openEHR data types, but appears instead in particular archetypes.
Quantity Ranges
Ranges are modelled by the generic type DV_INTERVAL<T:DV_ORDERED> which enables a range of
any of the other quantity types (except ratio) to be constructed. This allows any subtype of
DV_ORDERED to occur as a range as well.
Proportions
The DV_PROPORTION type is provided for representing true ratios, i.e. relative values, and consists
of numerator and denominator Real values, and a magnitude function which is computed as the result
of the numerator/denominator division. The type attribute is used to indicate the logical type of the
proportion. Supported types include:
-
percent: denominator is 100; usual presentation is "numerator %"
-
unitary: denominator is 1; usual presentation is "numerator"
-
fraction: numerator and denominator are both integer values; usual presentation is n/d, e.g. such as ½ or ¾, 1/2, 3/4 etc;
-
integer_fraction: numerator and denominator are both integer values; usual presentation is n/d; if numerator > denominator, display as "a b/c", i.e. the integer part followed by the remaining fraction part, e.g. 1½; this is the most likely form for expressing a number of tablets;
-
ratio: numerator and denominator can take any value; usual presentation is "numerator: denominator"
Lastly, the is_integral function indicates that the numerator and denominator are both integer values;
this is used for fractions (the fraction and integer_fraction types above) and other commonly occurring
ratios where both parts are always integer values.
Normal and Reference Ranges
Normal range for any of the quantity types (i.e. any instance of a subtype of DV_ORDERED) can be
included via the attribute DV_ORDERED.normal_range, of type REFERENCE_RANGE. Other reference
ranges (e.g. sub-critical, critical etc) can be included via the attribute
DV_ORDERED.other_reference_ranges. The separation of normal and other reference range attributes
is used because the former constitute the vast majority of ranges quoted for quantitative data.
Normal status can be included via the attribute DV_ORDERED.normal_status, which takes the form of
a DV_ORDINAL, whose symbol attribute is coded according to the openEHR terminology group "normal
status", and takes values "HHH" (critically high), "HH" (abnormally high), "H" (borderline
high)", "N" (normal), "L" … "LLL".
Recording Time
Time can be recorded in two ways. Absolute times in the social time domain, such as dates and time
of day are recorded using the types in the date_time package. Fine-grained 'time', which is a duration
rather than a time, is recorded using a DV_QUANTITY with units = 's' or another temporal unit
('h', 'ms', 'ns' etc).
Class Descriptions
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/dv_ordered.adoc[]
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/dv_interval.adoc[]
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/reference_range.adoc[]
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/dv_ordinal.adoc[]
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/dv_quantified.adoc[]
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/dv_amount.adoc[]
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/dv_quantity.adoc[]
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/dv_count.adoc[]
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/dv_proportion.adoc[]
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/proportion_kind.adoc[]
Unresolved include directive in modules/data_types/pages/quantity_package.adoc - include::../UML/classes/dv_absolute_quantity.adoc[]
Syntaxes
Units Syntax
The BNF syntax specification of the units string, adapted from UCUM is as follows:
units = '/' exp_units | units '.' exp_units | units '/' exp_units | exp_units ;
exp_units = unit_group exponent | unit_group ;
unit_group = PREFIX annot_unit | annot_unit | '(' exp_units ')' | factor ;
annot_unit = unit_name [ '{' ANNOTATION '}' ] | '{' ANNOTATION '}' ;
factor = Integer ;
exponent = [ SIGN ] Integer ;
PREFIX = 'Y' |'Z' | 'E' | 'P' | 'T' | 'G' | 'M' | 'k' | 'h' | 'da' | 'd' | 'c' | 'm' | 'μ' | 'n' | 'p' | 'f' | 'a' | 'z' | 'y' ;
UNIT_NAME = ? [a-zA-Z_%]+ ?; (* replace regex with values from unit tables *)
ANNOTATION = ? [a-zA-Z'.]+ ?; (* replace regex with values from unit tables *)
SUFFIX = ? [a-zA-Z0-9'_]+ ?; (* replace regex with values from unit tables *)
SIGN = '+' | '-' ;
Integer = ? [0-9]+ ?; (* regex *)
This proposal is comprehensive, covering all useful unit systems, including SI, various imperial, customary mesaures, and some obscure measures, as well as clinically specific additions. Metric prefixes, meaning-changing textual suffixes (e.g. "[Hg]" in "mm[Hg]") and non-meaning-changing annotations (e.g. "kg {total}") are recognised. With this syntax, units can be simply expressed in strings such as:
"kg/m^2", "m.s^-1", "km/h", "mm[Hg]"
and so on.