Saturday, April 30, 2022

SI or not to SI - that's the question

The last two days, I attended the virtual CDISC European Interchange, also gave a presentation titled "CDISC on FHIR - Progress and Results of the Vulcan Project". During the "CDISC Updates" the following question in the chat came up (I quote:) "Will CDISC gives more instructions about SI units applicable for each lab tests?", which probably translates best to: "which SI unit should we use for each specific lab parameter when submitting data to health authorities?".

Once again showing that many CDISC standards implementers have their problems with providing test result values with SI units (e.g. required by PMDA).

So, the attendee expected that CDISC would provide a list with the applicable SI unit for each lab test ...

The question and expectation also shows how little understanding there is about CDISC controlled terminology and its limitations.

For SDTM and SEND, CDISC mostly uses "post-coordinated" controlled terminology. This means that a set of codes, rather than a single code, is assigned for each test. For example, for "quantitative measurement of albumin in blood", the following coded values are assigned: LBTESTCD="ALB", LBTEST="Albumin", LBSPEC="BLOOD".
Until recently, there was no variable for "quantitative", this just had to be deduced from the submitted value and units.

Another example is "Measurement of glucose in urine by test strip". Here the assigned variable values, i.e. the "mapping" is: LBTESTCD="GLUC", LBTEST="Glucose" LBSPEC="URINE" and LBMETHOD="TEST STRIP".
Whether the measurement is quantitative or ordinal could, until recently, not be provided unless by the results themselves.

So, in order to uniquely define the test, one needs a combination of variables, rather than a single variable. The CDISC Controlled Terminology system however has never been able to provide this.
This also means that the answer to the question "Will CDISC gives more instructions about SI units applicable for each Lab tests?" can only have one answer: "CDISC can't".

For example, if someone asks "what is the SI unit for glucose", it just doesn't make sense, as the context is missing: what is the specimen (blood, urine, ...), is it a quantitative test (concentration) or ordinal (e.g. "presence of"). etc..

We also need to ask ourself: what is meant by "SI unit"? Essentially, all of "g/L", "g/dL", "g/dl", "mol/l", "mol/dL" etc. are all "SystemeInternational"! Every unit that can be derived from the seven "SI base units" is essentially an "SI unit"...
It is only in our clinical research world that we have started abusing the word "SI unit" for completely something else.
And what are "US conventional units"? The wording "conventional" suggests that there is a convention about that, but where was that convention made, and by who? I haven't been able to find out. Suggestions are welcome.
The discussion about "US conventional" and "SI" also suggests that a conventional unit cannot be an SI unit. This is not true at all. For example "mg/dL" as a typical example of "conventional unit" is also a valid "SI unit".

When I search the internet for lists of "US conventional" and "SI" units, I e.g. find: https://labpedia.net/si-unit-part-1-conventional-units-conversion-factors-international-unit-and-conventional-unit-for-lab-test/.
If you then look at the "SI units", one sees that in most cases, a "molar concentration unit" is meant (e.g. micromole/L), but
not always! Also, the "unit" depends on the analyte, wich essentially, doesn't make sense.

When one looks carefully, one will see that in most cases where the "SI unit" is still a "mass concentration", it is about an analyte where the molecular weight cannot be exactly determined (e.g. "lipid total") or where there is no agreement on it (e.g. "albumin"). But in most cases, a "molar" unit is meant. So, abuse of the wording "SI" indeed ...

Is this "the" official list? Is there an "official" list, does the FDA and PMDA have an "official" list? Or does every FDA/PMDA reviewer have his/her own "list" of "SI units" (and maybe just in the head)? I would not be surprised...

Ok, suppose that in most cases, we want to convert from "US conventional" (so meaning "mass units") to "molar" units.
How do we know what the appropriate "molar" unit is for our test? From the CDISC LBTESTCD/LBTEST we cannot know, as the context is missing. For example "concentration of glucose in stool" may well have another "SI/molar" unit than "concentration of glucose in blood". So, how can we at least find a suitable "SI/molar" unit for our test? With CDISC controlled terminology, we can't. 

This is where LOINC comes into play. LOINC is a coding system for tests in medicine, so not only for lab tests. The latest version (2.72) contains over 98,000 test codes, of which (as can be expected) a large part is lab tests.

LOINC is however "pre-coordinated", meaning that there is a single code for each unique test. So, the code for "quantitative measurement, mass concentration, of albumin in serum/plasma" is 1751-7, the full human-readable name being "Albumin [Mass/volume] in Serum or Plasma". If you go to the website defining this code, you will also find it has an entry "Example UCUM Units", which can be considered as "the mostly used unit for this test". 

Interesting with LOINC is that it distinguishes between quantitative tests on whether they are measured as "mass concentration" or "molar concentration". Continuing with our example, there is also a code "Albumin [Moles/volume] in Serum or Plasma" which is LOINC code 54347-0 with preferred units "umol/L" (UCUM notation). 

So essentially, the code 1751-7 and 54347-0 form a pair: one measuring the mass concentration, the other one measuring the molar concentration. The presence of "pairs" is essentially given for most tests in LOINC that represent a concentration.

So, coming back to the question of the CDISC Interchange attendee: if I receive a result from the lab with LOINC code 1751-7 and I need to find the corresponding molar unit (e.g. to populate LBSTRESN/LBSTRESU), all I need to do is to find the paired code for 1751-7 which is 54347-0, and take the "molar concentration unit" from that.

Well, I don't expect to have each SDTM mapper to do that him- or herself, as there are some smart people who have already automated that in software, and even made a free RESTful web service for it that is available for everyone and any application.

You can find the documentation of that RESTful web service (and many other services for working for LOINC codes) at: http://xml4pharmaserver.com/WebServices/LOINC_webservices.html

For the service to find the "SI/molar" unit for a test where you received the data with a "conventional/mass" unit, navigate to "Get the corresponding 'Substance Concentration' LOINC code with details starting from a 'Mass concentration' LOINC code".
Of course, there is also a service to do the other way around: from "molar" to "mass".

Although the RESTful web service is meant to be implemented by software applications, you can also use it in the browser, e.g. for testing. For example, when you received LOINC code 1751-7 from the lab (with "mass concentration") and you want to find out the corresponding "SI/molar unit", use:


i.e.: http://www.xml4pharmaserver.com:8080/LOINCService/rest/LOINCMassToSubstance/1751-7

The server will then respond with:

providing the corresponding LOINC code with "molar concentration" and the corresponding SI unit "umol/L".

Thus: when you do have the LOINC code for the test (and you should get it from the lab), then getting the corresponding "SI/molar" unit is just a matter of implementing the RESTful web service in your mapping software.

But that is only the beginning!

The next question is "how do I convert from "conventional/mass" units to "SI/molar" units?" Or the other way around? Many people will start writing complex scripts using conversion factors depending on the compound that is measured and the type of test (e.g. specimen used) but when using the LOINC code, this can fully be automated. With CDISC controlled terminology: no chance ...

The National Library of Medicine (NLM) has developed a RESTful web service for this. You can find the specification at: https://ucum.nlm.nih.gov/ucum-service.html

There is even a method available for "conventional/mass" unit to "SI/molar" unit conversion:


The method requires either to provide the molecular weight, or easier, the LOINC code. The molecular weight to do the conversion is then retrieved from the LOINC code.

For our old example with albumin, you (or better, your application) can e.g. convert 4.2 g/dL to umol/L using:

https://ucum.nlm.nih.gov/ucum-service/v1/ucumtransform/4.2/from/g/dL/to/umol/L/LOINC/1751-7

Try it yourself!

Using these RESTful web services allow to automate a lot when generating SDTM records. For example, one of our customers who is using the "SDTM-ETL" mapping software (that is implementing these), managed to do thousands of conventional to SI units (for LBSTRESN/LBSTRESU) in just seconds, with just one mapping statement in their mapping script.

If this were not enough, the LOINC code also allows to automatically populate LBTESTCD, LBTEST, LBSPEC, LBMETHOD, etc.. Also for that, there is a RESTful web service, for which you can find the description at:
http://xml4pharmaserver.com/WebServices/LOINC2CDISC_webservices.html

Conclusion: We should stop speaking about "US conventional" and "SI" units, as for "conventional", it is totally unclear what the "convention" is. Instead, we should better (where applicable) speak about "mass concentration units". For "SI" units, in clinical research, we have completely abused the word, as in most cases (but not always!) we mean "molar concentration units".
Furthermore, we have shown here that when one has the LOINC code for the test (which we should receive from the lab anyway), we can easily, and even automatically find the most suitable "molar" unit, by using the mentioned RESTful API. No chance to achieve this with CDISC controlled terminology.
And last but not least, UCUM notation for units allows to fully automate the conversion from "mass concentration" to "molar concentration" using the RESTful API provided by the National Library of Medicine. The good news for CDISC is that for concentrations, there is a good overlap between CDISC and UCUM notation. But CDISC should move to UCUM anyway.

Comments and questions regarding "SI" units and "conventional to SI unit conversion" (or the other way around) are always welcome!