The second
part was on UCUM, the "Unified Code for Units of Measure". It
explained why UCUM was not chosen by CDISC right from the start, and why CDISC
developed its own controlled terminology for units of measure.
In my personal
opinion, there are many myths about UCUM in the clinical research community,
due to a lack of understanding and experience with it. Easy to say for me I
admit, as I am a UCUM specialist, having developed several applications for
working with it.
One of the
statements made in the webinar, in my opinion a myth, is that UCUM uses "uncommon"
square and curly brackets for certain things. For example for the square
brackets, as in "mm[Hg]" and "[in_i]" ("millimetre
mercury column" and "inches") these may seem "special"
for people using unit notations from the "paper world", but the
square brackets are there to mark that the unit is not a SI unit ([in_i]) or
that part of the unit is not a unit at all ([Hg]) but designates how the
measurement is done or what is used for it (originally a mercury column). These
square brackets allow to automate unit conversions, e.g. between a
"Belgian blood pressure" (cm[Hg]) and the usual unit for blood
pressure (mm[Hg]). For the case of inches, it means that the unit is not a
worldwide standard one, and cannot be reduced directly to one or a combination
of the seven base units: m, g, s, K (Kelvin), A (ampere), mol, and cd
(Candela). It first needs to be
converted to an SI unit, in this case to "cm", using the "ucum_essence.xml"
file or RESTful web services,
based on it. I will come back on the role of the "ucum_essence.xml"
file later.
The second "uncommon bracket case" is the use of the curly brackets (named "curly braces" in UCUM). The example that was given during the webinar was {Capsule}. "Capsule" is however NOT a standardized unit, as the size of a capsule is not standardized, nor is its content. In CDISC-CT however "CAPSULE" (C48480) is a unit of measure, which in my opinion is completely wrong. A "capsule" is a dosing form, not a unit. Furthermore, it was unfortunately not explained what these curly braces mean. The meaning is that curly braces are used for annotations. The exact rules for "annotations" are very well described in the UCUM specification. Annotations in UCUM (the things in curly braces) are however NEVER mandatory. The spec states: "Annotations do not contribute to the semantics of the unit but are meaningless by definition." It also states: "Curly braces are here because people want annotations and deeply believe that they need annotations" . An example could be "mg/dL{Albumin}". Even more, UCUM makes it clear that removing the annotation does not make any change to the unit itself, as the annotation is not part of the unit. If one however strongly believes that "Capsule" is a unit of measure (which is wrong), this indeed leads to the myth that UCUM uses "uncommon" brackets.
The second "uncommon bracket case" is the use of the curly brackets (named "curly braces" in UCUM). The example that was given during the webinar was {Capsule}. "Capsule" is however NOT a standardized unit, as the size of a capsule is not standardized, nor is its content. In CDISC-CT however "CAPSULE" (C48480) is a unit of measure, which in my opinion is completely wrong. A "capsule" is a dosing form, not a unit. Furthermore, it was unfortunately not explained what these curly braces mean. The meaning is that curly braces are used for annotations. The exact rules for "annotations" are very well described in the UCUM specification. Annotations in UCUM (the things in curly braces) are however NEVER mandatory. The spec states: "Annotations do not contribute to the semantics of the unit but are meaningless by definition." It also states: "Curly braces are here because people want annotations and deeply believe that they need annotations" . An example could be "mg/dL{Albumin}". Even more, UCUM makes it clear that removing the annotation does not make any change to the unit itself, as the annotation is not part of the unit. If one however strongly believes that "Capsule" is a unit of measure (which is wrong), this indeed leads to the myth that UCUM uses "uncommon" brackets.
What was
unfortunately also not mentioned is the use of the "ucum-essence.xml"
file and all software and RESTful web services that are based on it. A unique
property of UCUM notation is that it allows conversion between ANY units for
the same property, through the ucum-essence.xml file.
So, if you would like to do a unit conversion between nautical miles and
inches, you can do so in UCUM (not possible with CDISC units …). The reason is
that EVERY UCUM unit can be reduced to one or a combination of the seven base
units by cascading down. So, if you do this both for "nautical mile"
and for "inch", you easily get the conversion factor. Also for this,
the NLM provides a RESTful web service. For our example:
The NLM
RESTful web service of course also allows to do a direct conversion, but in the
background, it is just two conversions to the base units and then dividing the "to-base-units"
conversion factors:
Unfortunately,
this magnificent advantage of using UCUM was not mentioned during the webinar.
Another statement
made during the webinar is that UCUM does not allow to see that two units have
exactly the same meaning. This is also a myth. The example that was given is
"mg/mL" and "g/L". In CDISC-CT "UNIT", only
"g/L" is allowed, "mg/mL" is not allowed. So in SDTM, when
a result from a lab was received in mg/mL" it needs to be converted to
"g/L", even in --ORRES (original result), so that in such a case
"original unit" is not "original" anymore, and such
mandatory conversions can easily lead to errors. Rather surprisingly (I just
found out …), for PK units, only "mg/mL" is allowed and
"g/L" is not allowed!
Back to the myth however that UCUM does not allow to see that "mg/mL" and "g/L" are the same thing. As UCUM is a NOTATION, similar to a mathematical expression, and not a LIST, it easily is possible to see that both are the same unit. Just striking out the "m" (milli) prefix in both nominator and denominator (allowed in UCUM) makes it clear that 1 mg/mL = 1 g/L.
The reason is that in UCUM, the forward slash has a very specific meaning: it means "division". This is not always the case in CDISC-CT. UCUM units are essentially mathematical expressions. This was however not mentioned in the webinar.
Also, if I do an (automated) unit conversion between "mg/mL" and "g/L" , e.g. using the RESTful web service provided by the NLM, it immediately becomes clear that "mg/mL" and "g/L" are the same thing:
Back to the myth however that UCUM does not allow to see that "mg/mL" and "g/L" are the same thing. As UCUM is a NOTATION, similar to a mathematical expression, and not a LIST, it easily is possible to see that both are the same unit. Just striking out the "m" (milli) prefix in both nominator and denominator (allowed in UCUM) makes it clear that 1 mg/mL = 1 g/L.
The reason is that in UCUM, the forward slash has a very specific meaning: it means "division". This is not always the case in CDISC-CT. UCUM units are essentially mathematical expressions. This was however not mentioned in the webinar.
Also, if I do an (automated) unit conversion between "mg/mL" and "g/L" , e.g. using the RESTful web service provided by the NLM, it immediately becomes clear that "mg/mL" and "g/L" are the same thing:
Such
RESTful web services even allow to fully automate the conversion from
"original units" to "standardized units" (--ORRES/--ORRESU
to --STRES/--STRESU) in SDTM, as I will live-demo at the next CDISC Interchangein Berlin,
something that is impossible when using CDISC units.
So, the
statement that UCUM does not allow to see that two units are the same thing is
simply not true. It is just the other way around: CDISC-CT does only have one
term for different units that are the same, because CDISC-CT does not have any
support or possibility for unit conversions! For example, only "g/L"
is allowed as a unit in the "UNIT" codelist, as with CDISC-CT, systems
are not capable to detect through unit conversion that "g/L" is the
same as "mg/mL". The lack of unit conversion possibilities of CDISC
units is one of the great weaknesses of CDISC-CT units. With UCUM, unit
conversion comes "free of charge".
The good
news however is that for laboratory units, there is a good overlap between UCUM
notation and CDISC units. "g/L" for example is valid within CDISC as
well as in UCUM. "mg/mL" is however only valid in the
"PKUNIT" CDISC-CT, but not in the "UNIT" CDISC-CT. There is
however also a RESTful web service by the NLM that allows you (or better: your application) to check whether your"unit" is also a valid UCUM unit.
So, even
after the webinar, I keep stating that CDISC should allow to use UCUM notation
in submission standards. The RESTful web services for unit conversions using
UCUM can also easily be used by the FDA review tools. With CDISC units (unless
identical to the UCUM notation), there is no chance at all to automate unit
conversions, neither by the sponsor nor by the FDA. With CDISC units, both have
to rely on "conversion tables", which is not only cumbersome, but also
error prone.
Also, the recent mentioning of UCUM in the "FDA Standards Catalog" might imply that FDA may mandate the use of UCUM notation for certain CDISC submissions
in future, as it already does for SPL (Structured Product Labelling) and encourages
for drug establishment registration and drug listing. So, we at CDISC, should better be
prepared for such an event, and allow UCUM already now, so that any FDA mandate
of UCUM does not come as a sudden surprise, causing some panic, as was the case
with LOINC for laboratory data in SDTM.
Next time I will explain how using UCUM in combination with LOINC even allows to automate US conventional units to SI units (or vice versa) fully automatically, something that is completely impossible when using CDISC units. This method even allowed us to generate complete SDTM-LB and DM datasets from different EHR systems within minutes, including standardization (LBSTRESN/LBSTRESU) from US conventional to SI units. A first blog about this, titled "From FHIR to SDTM-LB (and DM) in just two minutes" was already published a few weeks ago.