Thursday, October 13, 2016

Units and ODM 2.0

A few of us are already making thoughts about what should be the requirements for a CDISC ODM 2.0 standard. Especially integration with healthcare is one of the main topics. Support for RESTful web services and an additoinal JSON implementation are surely on the list.

One of the main problems with the current version of ODM 1.3 is the way units of measure are handled. 10 years ago, when ODM 1.3 was developed, we were not aware of UCUM yet, nor of LOINC and other coding systems in healthcare. At that time, we were just starting experimenting with extracting information from electronic healthcare records (EHRs) anyway. ODM was very case report form (CRF) centric, without much consideration of how one can (automatically) populate a CRF from an electronic health record or a hospital information system (HIS) anyway.

The way units of measure are implemented in ODM is very simple: one just defines a list of units of measure, and then reference them later. For example:


Defining "inches" and "centimeters". What these exactly mean (e.g. that "centimeter" means 1/100 of a meter and that the latter is an SI unit) is not included, nor is any conversion information (e.g. that 1 inch is 2.54 cm). It even doesn't state what the property is (in the case "length").

In the definition of the data point (an ItemDef), these are then referenced, e.g. by:



Stating that the height can either be expressed in a unit "inches" or a unit "centimeter" whatever these may mean - a machine will not really understand. Also a machine will not understand that this is about body height. For that we need to add some semantic information, like a LOINC code. Currently, this can be done using the "Alias" child element:


(P.S.  some elements have been collapsed for clarity)
Also remark the "SDTM annotation", stating that this datapoint will later come into VSORRES in the case that VSTESTCD has the value "HEIGHT".

So, how is this implemented into the CRF? The ODM doesn't tell us. Are there two checkboxes on the CFR, one with "in" and one with "cm" and the investigator needs to check one of them? Or are there two versions of the CRF, one for anglo saxon countries with "inches" preprinted and one for countries using metric units and "cm" preprinted?

If there is only one unit of measure assigned, the case is clear. For example, for a blood presssure:


with a single reference to a unit of measure "millimeter mercury column":


A computer system however does not know what this really means (semantically), e.g. that it represents a pressure. We can however add that information by adding the UCUM notation, using an "Alias" again:


And if we look in the UCUM "ucum-essence.xml" file, we get all information for free:

stating that "meter mercury column" is a unit for the property "pressure" and that it is equal to 133.322 kilopascal. For "millimeter mercury column", the systems knows that in UCUM there is a prefix "m" with meaning "milli" and value "0.001" (also defined in the ucum-essence.xml).

This also allows to do unit conversions in an automated way, even using publicly available RESTful web services for UCUM unit conversions. Like that, a system can easily find out that a blood pressure of 2.5 [psi] (pounds per square inch) corresponds to 129.29 mm[Hg].

So, as an intermediate conclusion, we can state that it is already possible to give semantic meaning to measurements (by providing their LOINC code) and the units in which they are expressed (by providing the UCUM notation), by using the "Alias" mechanism.

This would also enable systems to automatically extract information from EHRs (e.g. using HL7 CDA or FHIR) as in these systems, body height is coded using the LOINC code "8302-2" and the value MUST be given using UCUM notation. For example (as FHIR):


with the LOINC code in the "code" element (middle part of the snapshot), the value in the "value" element (near the bottom) and the (UCUM) unit in the "code" element under "valueQuantity" (lower part of the snapshot).

Is this sufficient?

I do not think so.

"Alias" can be used for anything, and the content of "Context" is not standardized. Also, we should encourage the use of UCUM, as the current codelist for units developed by CDISC is a disaster anyway. Even for pre-clinical studies, to be submitted as SEND, the use of UCUM unis would be a great stepf forward. So we are thinking about "promoting" the UCUM notation to an attribute on MeasurementUnitDef itself, something like (but don't pin me on that!):


However, that doesn't solve everything...

When talking about measurements and units in clinical research, and especially for laboratory tests,  I think we can see the following categories:

  • The measurement has no unit. For example: "pH"
  • We know the exact unit of measure in advance. For example "millimeter mercury column" for a "blood pressure". This is covered by the current use of "MeasurementUnit" in ODM. The unit can then be preprinted on the form, and/or stored in the database as the one we know will always be the case
  • There is a choice of units. For example: choice between "cm" and "inches". Also this is covered except for how it is "rolled out", e.g. by different CRF versions based on culture or country
  • We don't know what units we will get back. This is often the case for lab tests.  Unfortunately, most protocols do not provide suffient details about what exactly should be done, they e.g. simply state "do a glucose in urine test". We can then expect a multitude of units (or their absence) back: one lab will report in mg/dL, one in mmol/L, other will provide ordinal information (1+, 2+, ... - no units), making comparison hard (how will we standardize to --STRESU in SDTM?). In such a case, the unit information is usually a field on the CRF. For example:

The latter is OK, as the "question" about the unit is just another question, we loose the information that a) it is a unit, and b) it is a unit for the albumin concentration. of course, this could e.g. be solved by an "Alias", like:

    but this is not a very elegant solution, as the content of "Alias" is not standardized.
    In CDA and FHIR, this is easy, as these do not define what is to be measured, just what has been measured. In ODM, it is just a bit more difficult.
    Now, I do not know the solution for this, but it is something that we (the XML technologies team) will need to tackle.