Saturday, December 21, 2019

Myths and Truths about Myths and Truths on UCUM

A few weeks ago I attended the CDISC webinar on "­Myths and Truths: Insight into CDISC’s Controlled Terminology Magic". The webinar was for a good part on two of my favourite topics: LOINC and UCUM. I think the LOINC topic was very well explained, including the need for several new variables in SDTM for the LB and other findings domains that will come in the SDTM v2.0 model and SDTMIG 3.4, taking care of test uniqueness.
The second part was on UCUM, the "Unified Code for Units of Measure". It explained why UCUM was not chosen by CDISC right from the start, and why CDISC developed its own controlled terminology for units of measure. 

In my personal opinion, there are many myths about UCUM in the clinical research community, due to a lack of understanding and experience with it. Easy to say for me I admit, as I am a UCUM specialist, having developed several applications for working with it. 

One of the statements made in the webinar, in my opinion a myth, is that UCUM uses "uncommon" square and curly brackets for certain things. For example for the square brackets, as in "mm[Hg]" and "[in_i]" ("millimetre mercury column" and "inches") these may seem "special" for people using unit notations from the "paper world", but the square brackets are there to mark that the unit is not a SI unit ([in_i]) or that part of the unit is not a unit at all ([Hg]) but designates how the measurement is done or what is used for it (originally a mercury column). These square brackets allow to automate unit conversions, e.g. between a "Belgian blood pressure" (cm[Hg]) and the usual unit for blood pressure (mm[Hg]). For the case of inches, it means that the unit is not a worldwide standard one, and cannot be reduced directly to one or a combination of the seven base units: m, g, s, K (Kelvin), A (ampere), mol, and cd (Candela).  It first needs to be converted to an SI unit, in this case to "cm", using the "ucum_essence.xml" file or RESTful web services, based on it. I will come back on the role of the "ucum_essence.xml" file later.
The second "uncommon bracket case" is the use of the curly brackets (named "curly braces" in UCUM). The example that was given during the webinar was {Capsule}. "Capsule" is however NOT a standardized unit, as the size of a capsule is not standardized, nor is its content. In CDISC-CT however "CAPSULE" (C48480) is a unit of measure, which in my opinion is completely wrong. A "capsule" is a dosing form, not a unit. Furthermore, it was unfortunately not explained what these curly braces mean. The meaning is that curly braces are used for annotations. The exact rules for "annotations" are very well described in the UCUM specification. Annotations in UCUM (the things in curly braces) are however NEVER mandatory. The spec states: "
Annotations do not contribute to the semantics of the unit but are meaningless by definition." It also states: "Curly braces are here because people want annotations and deeply believe that they need annotations" . An example could be "mg/dL{Albumin}". Even more, UCUM makes it clear that removing the annotation does not make any change to the unit itself, as the annotation is not part of the unit. If one however strongly believes that "Capsule" is a unit of measure (which is wrong), this indeed leads to the myth that UCUM uses "uncommon" brackets.

What was unfortunately also not mentioned is the use of the "ucum-essence.xml" file and all software and RESTful web services that are based on it. A unique property of UCUM notation is that it allows conversion between ANY units for the same property, through the ucum-essence.xml file. So, if you would like to do a unit conversion between nautical miles and inches, you can do so in UCUM (not possible with CDISC units …). The reason is that EVERY UCUM unit can be reduced to one or a combination of the seven base units by cascading down. So, if you do this both for "nautical mile" and for "inch", you easily get the conversion factor. Also for this, the NLM provides a RESTful web service. For our example:
 
The NLM RESTful web service of course also allows to do a direct conversion, but in the background, it is just two conversions to the base units and then dividing the "to-base-units" conversion factors:
Unfortunately, this magnificent advantage of using UCUM was not mentioned during the webinar.

Another statement made during the webinar is that UCUM does not allow to see that two units have exactly the same meaning. This is also a myth. The example that was given is "mg/mL" and "g/L". In CDISC-CT "UNIT", only "g/L" is allowed, "mg/mL" is not allowed. So in SDTM, when a result from a lab was received in mg/mL" it needs to be converted to "g/L", even in --ORRES (original result), so that in such a case "original unit" is not "original" anymore, and such mandatory conversions can easily lead to errors. Rather surprisingly (I just found out …), for PK units, only "mg/mL" is allowed and "g/L" is not allowed!
Back to the myth however that UCUM does not allow to see that "mg/mL" and "g/L" are the same thing. As UCUM is a NOTATION, similar to a mathematical expression, and not a LIST, it easily is possible to see that both are the same unit. Just striking out the "m" (milli) prefix in both nominator and denominator (allowed in UCUM) makes it clear that 1 mg/mL = 1 g/L.
The reason is that in UCUM, the forward slash has a very specific meaning: it means "division". This is not always the case in CDISC-CT. UCUM units are essentially mathematical expressions. This was however not mentioned in the webinar.
Also, if I do an (automated) unit conversion between "mg/mL" and "g/L" , e.g. using the RESTful web service provided by the NLM, it immediately becomes clear that "mg/mL" and "g/L" are the same thing:
Such RESTful web services even allow to fully automate the conversion from "original units" to "standardized units" (--ORRES/--ORRESU to --STRES/--STRESU) in SDTM, as I will live-demo at the next CDISC Interchangein Berlin, something that is impossible when using CDISC units.
So, the statement that UCUM does not allow to see that two units are the same thing is simply not true. It is just the other way around: CDISC-CT does only have one term for different units that are the same, because CDISC-CT does not have any support or possibility for unit conversions! For example, only "g/L" is allowed as a unit in the "UNIT" codelist, as with CDISC-CT, systems are not capable to detect through unit conversion that "g/L" is the same as "mg/mL". The lack of unit conversion possibilities of CDISC units is one of the great weaknesses of CDISC-CT units. With UCUM, unit conversion comes "free of charge".
The good news however is that for laboratory units, there is a good overlap between UCUM notation and CDISC units. "g/L" for example is valid within CDISC as well as in UCUM. "mg/mL" is however only valid in the "PKUNIT" CDISC-CT, but not in the "UNIT" CDISC-CT. There is however also a RESTful web service by the NLM that allows you (or better: your application) to check whether your"unit" is also a valid UCUM unit.

So, even after the webinar, I keep stating that CDISC should allow to use UCUM notation in submission standards. The RESTful web services for unit conversions using UCUM can also easily be used by the FDA review tools. With CDISC units (unless identical to the UCUM notation), there is no chance at all to automate unit conversions, neither by the sponsor nor by the FDA. With CDISC units, both have to rely on "conversion tables", which is not only cumbersome, but also error prone.
Also, the recent mentioning of UCUM in the "FDA Standards Catalog" might imply that FDA may mandate the use of UCUM notation for certain CDISC submissions in future, as it already does for SPL (Structured Product Labelling) and encourages for drug establishment registration and drug listing. So, we at CDISC, should better be prepared for such an event, and allow UCUM already now, so that any FDA mandate of UCUM does not come as a sudden surprise, causing some panic, as was the case with LOINC for laboratory data in SDTM.

Next time I will explain how using UCUM in combination with LOINC even allows to automate US conventional units to SI units (or vice versa) fully automatically, something that is completely impossible when using CDISC units. This method even allowed us to generate complete SDTM-LB and DM datasets from different EHR systems within minutes, including standardization (LBSTRESN/LBSTRESU) from US conventional to SI units. A first blog about this, titled "From FHIR to SDTM-LB (and DM) in just two minutes" was already published a few weeks ago.