Leaf Data
All ODIN data eventually devolve to instances of the primitive types String , Integer , Real , Double , String , Character , various date/time types, lists or intervals of these types, and a few special types. ODIN does not use type or attribute names for instances of primitive types, only manifest values, making it possible to assume as little as possible about type names and structures of the primitive types. In all the following examples, the manifest data values are assumed to appear immediately inside a leaf pair of angle brackets, i.e.
some_attribute = <manifest value>
Primitive Types
Character Data
Characters are shown in a number of ways. In the literal form, a character is shown in single quotes, as follows:
'a'
Characters outside the low ASCII (0-127) range must be UTF-8 encoded, with a small number of backslash-quoted ASCII characters allowed, as described in the section [File Encoding].
String Data
All strings are enclosed in double quotes, as follows:
"this is a string"
Quotes are encoded using ISO/IEC 10646 codes, e.g. :
"this is a much longer string, what one might call a "phrase"."
Line extension of strings is done simply by including returns in the string. The exact contents of the string are computed as being the characters between the double quote characters, with the removal of white space leaders up to the left-most character of the first line of the string. This has the effect of allowing the inclusion of multi-line strings in ODIN texts, in their most natural human-readable form, e.g.:
text = <"And now the STORM-BLAST came, and he
Was tyrannous and strong :
He struck with his o'ertaking wings,
And chased us south along.">
String data can be used to contain almost any other kind of data, which is intended to be parsed as some other formalism. Characters outside the low ASCII (0-127) range must be UTF-8 encoded, with a small number of backslash-quoted ASCII characters allowed, as described in section [File Encoding].
Integer Data
Integers are represented simply as numbers, e.g.:
25
300000
29e6
Commas or periods for breaking long numbers are not allowed, since they confuse the use of commas used to denote list items (See Lists of Built-in Types below).
Real Data
Real numbers are assumed whenever a decimal is detected in a number, e.g.:
25.0
3.1415926
6.023e23
Commas or periods for breaking long numbers are not allowed. Only periods may be used to separate the decimal part of a number; unfortunately, the European use of the comma for this purpose conflicts with the use of the comma to distinguish list items (See Lists of Built-in Types below).
Dates and Times
Complete Date/Times
In ODIN, full and partial dates, times and durations can be expressed. All full dates, times and durations are expressed using a subset of ISO 8601. The openEHR Foundation Types specification provides a full explanation of the ISO 8601 semantics supported in openEHR.
In ODIN, the use of ISO 8601 allows extended form only (i.e. ':' and '-' must be used). The ISO 8601 method of representing partial dates consisting of a single year number, and partial times consisting of hours only are not supported, since they are ambiguous. See below for partial forms.
Durations are expressed using a string which starts with 'P', and is followed by a list of periods, each appended by a single letter designator: 'Y' for years, "M' for months, 'W' for weeks, 'D' for days, 'H' for hours, 'M' for minutes, and 'S' for seconds. The literal 'T' separates the YMWD part from the HMS part, ensuring that months and minutes can be distinguished. Examples of date/time data include:
1919-01-23 -- birthdate of Django Reinhardt
16:35:04,5 -- rise of Venus in Sydney on 24 Jul 2003
2001-05-12T07:35:20+1000 -- timestamp on an email received from Australia
P22DT4H15M0S -- period of 22 days, 4 hours, 15 minutes
Partial Date/Times
Two ways of expressing partial (i.e. incomplete) date/times are supported in ODIN. The ISO 8601 incomplete formats are supported in extended form only (i.e. with '-' and ':' separators) for all patterns that are unambiguous on their own. Dates consisting of only the year, and times consisting of only the hour are not supported, since both of these syntactically look like integers. The supported ISO 8601 patterns are as follows:
yyyy-MM -- a date with no days
hh:mm -- a time with no seconds
yyyy-MM-ddThh:mm -- a date/time with no seconds
yyyy-MM-ddThh -- a date/time, no minutes or seconds
To deal with the limitations of ISO 8601 partial patterns in a context-free parsing environment, a second form of pattern is supported in ODIN, based on ISO 8601. In this form, '?' characters are substituted for missing digits. Valid partial dates follow the patterns (corresponding to reduced accuracy forms described in ISO 8601-2004, section 4.1.4.3):
yyyy-MM-?? -- date with unknown day in month
yyyy-??-?? -- date with unknown month and day
Valid partial times follow the patterns (corresponding to reduced accuracy forms described in ISO 8601-2004, section 4.2.2.3):
hh:mm:?? -- time with unknown seconds
hh:??:?? -- time with unknown minutes and seconds
Valid date/times follow the patterns (corresponding to reduced accuracy forms described in ISO 8601-2004, section 4.3.3):
yyyy-MM-ddThh:mm:?? -- date/time with unknown seconds
yyyy-MM-ddThh:??:?? -- date/time with unknown minutes and seconds
Intervals of Ordered Primitive Types
Intervals of any ordered primitive type, i.e., Integer, Real, Date, Time, Date_time and Duration, can be stated using the following uniform syntax, where N, M are instances of any of the ordered types:
|N..M| -- the two-sided range N >= x <= M;
|>N..M| -- the two-sided range N > x <= M;
|N..<M| -- the two-sided range N <= x <M;
|>N..<M| -- the two-sided range N > x <M;
|<N| -- the one-sided range x < N;
|>N| -- the one-sided range x > N;
|>=N| -- the one-sided range x >= N;
|<=N| -- the one-sided range x <= N;
|N +/-M| -- interval of N ±M.
|N±M| -- interval of N ±M.
The allowable values for N and M include any value in the range of the relevant type.
| for the plus/minus form (N ± M format), spaces are not significant either side of the '±' sign or the equivalent '+/-' string. |
Examples of this syntax include:
|0..5| -- integer interval
|0.0..1000.0| -- real interval
|0.0..<1000.0| -- real interval 0.0 <= x < 1000.0
|08:02..09:10| -- interval of time
|>=1939-02-01| -- open-ended interval of dates
|5.0 ±0.5| -- 4.5 ±5.5
|5.0 +/-0.5| -- 4.5 ±5.5
|>=0| -- >= 0
Other Built-in Types
URIs
URI can be expressed as ODIN data in the usual way found on the web, and follow the standard syntax from IETF RFC3986. Examples of URIs in ODIN:
http://openEHR.org/home
ftp://get.this.file.com?file=cats.doc#section_5
http://www.mozilla.org/products/firefox/upgrade/?application=thunderbird
Encoding of special characters in URIs follows the IETF RFC 3986, as described in the section [File Encoding].
Coded Terms
Coded terms are ubiquitous in medical and clinical information, and are likely to become so in most other industries, as ontologically-based information systems and the 'semantic web' emerge. The logical structure of a coded term is simple: it consists of an identifier of a terminology (with optional version), and an identifier of a code within that terminology. The ODIN string representation is of the following form:
[terminology_id::code]
where terminology_id is an alphanumeric name, optionally followed by a version in parentheses, and code is a string. The allowed characters in each part are described in the grammar.
Examples from clinical data:
[icd10AM::F60.1] -- from ICD10AM
[snomed_ct::2004950] -- from snomed-ct
[snomed_ct(3.1)::2004950] -- from snomed-ct v 3.1
Lists of Built-in Types
Data of any primitive type can occur singly or in lists, which are shown as comma-separated lists of items, all of the same type, such as in the following examples:
"cyan", "magenta", "yellow", "black" -- printer's colours
1, 1, 2, 3, 5 -- first 5 fibonacci numbers
08:02, 08:35, 09:10 -- set of train times
No assumption is made in the syntax about whether a list represents a set, a list or some other kind of sequence - such semantics must be taken from an underlying information model.
Lists which happen to have only one datum are indicated by using a comma followed by a list continuation marker of three dots, i.e. "…", e.g.:
"en", ... -- languages
"icd10", ... -- terminologies
[at0200], ...
White space may be freely used or avoided in lists, i.e. the following two lists are identical:
1,1,2,3
1, 1, 2,3