The mapping of (local) lab test codes to the lab test controlled terminology in CDISC-SDTM can be challenging. With the wording of Paul Vervuren: "One has to find candidates in the extensive controlled terminology list. Then there can be multiple lab tests that map to a single SDTM controlled term. This means additional variables must be used in order to produce a unique test definition (e.g. LBCAT, LBSPEC, LBMETHOD and/or LBELTM). Finally, it can occur that a controlled term is not available and a code needs to be defined in agreement with the rules for Lab tests".
In this blog entry we will explain such a mapping starting from a lab test defined by a LOINC test code. LOINC is a worldwide used system (not just a list of terms) for test codes used in healthcare, not only for lab tests, but also for vital signs and many other tests. LOINC coding is found (or even mandated) to be used in:
- HL7-v2 messages in hospital information systems (HIS)
- Interoperable electronic health records (HL7-v3, CDA/CCD, FHIR)
- messages that use the CDISC-Lab standard
The SDTM lab test codes however are used nowhere in the world except for SDTM submissions. CDISC however does not allow (yet) to use LOINC codes for use in LBTESTCD, this although the LOINC code uniquely describes the lab test, which the CDISC controlled terminology for LBTESTCD does not at all, even not when in combination with LBCAT, LBSPEC, LBMETHOD, ...
So unfortunately, we need to map from a universal system (LOINC) to a local system (only used in submissions). How can this be done?
Let us take an example.
The LOINC code 73710-6 uniquely defines the test "Weed Allergen Mix 209 (Common ragweed+Western ragweed+Giant ragweed) IgE Ab [Units/volume] in Serum by Multidisk". It is usually measured in k[IU]/L (thousand international units per liter). "k[IU]/L" is UCUM notation, which is again a system for units, and not just a list like the CDISC [UNIT] controlled terminology.
Now, how do we map this test to SDTM?
First we must understand that the LOINC code is just a number for the "LOINC Name", which consists of 5-6 parts ("dimensions"). The "LOINC Name" of code 73710-6 is "(Ambrosia elatior+Ambrosia psilostachya+Ambrosia trifida) Ab.IgE:ACnc:Pt:Ser:Qn:Multidisk" with each part being separated by a colon (":"). So the parts are:
- Component: (Ambrosia elatior+Ambrosia psilostachya+Ambrosia trifida) Ab.IgE
- Property measured: ACnc ("arbitrary concentration")
- Time aspect: Pt ("point in time")
- System/Specimen: Ser (serum)
- Scale: Qn (quantitative)
- Method: Multidisk
All this information can easily be retrieved using a local copy of the LOINC database, one of the several RESTful webservices for machine-to-machine communication, a simple Google search, or the LOINC search website. For example:
For the mapping, let us first start with LBTESTCD (lab test code). Can we find something in the latest (2017-09-29) CDISC codelist LBTESTCD (NCI code C65047)?
When searching for "Ambrosia", we can find 4 terms for "Ambrosia psilostachya pollen antigen XXX antibody" where "XXX" is either "IgA", "IgE", "IgG" or "IgG4" (C130092 to C130092 ). That's it. Anyway, considerable less than the number of hits (49) when searching for "Ambrosia" in LOINC. None of them fits, as our test is about a mixture of pollen. In the SDTM LBTESTCD codelist, if we look for "mix 209" or even for "209", we find ... nothing.
There are now 2 possibilities:
- extend the LBTESTCD codelist with our own invented term (8 characters maximum, not starting with a number.
- request a new term to the CDISC-CT development team
Option 2 means that we will need to wait 6 months or more for the new term to be approved (with the risk that our request is turned down). Do we want to delay our submission for 6 months?
As there is no "hit" for LBTESTCD, there cannot be one for "LBTEST" (laboratory test name) either. So also this codelist needs to be extended. The most logical choice seems to be to use the "LOINC long (common) name) which is "Weed Allergen Mix 209 (Common ragweed+Western ragweed+Giant ragweed) IgE Ab [Units/volume] in Serum by Multidisk". However, we can't as it is more than 40 characters long and LBTEST is limited to 40 characters due to a relict of the SAS Transport 5 limitations. So we need to shorten it, maybe to "Weed Allergen Mix 209 IgE Ab [Units/volume] in Serum by Multidisk", which is still more than 40 characters. Limiting to the absolutely necessary, we can use "Weed Allergen Mix 209 IgE Ab". Remark that the wording "in Serum" must be removed, as it belongs to LBSPEC (specimen) and the test name must be the same independent of the specimen type (at least in the CDISC-CT phylosophy).
Finally, less than 40 characters!
The next SDTM variable describing the test that needs to be populated is LBCAT. It is an "expected" SDTM variable, but there is no CDISC controlled terminology. So we need to define something ourselves. We can e.g. choose "ALLERGY ANTIGEN ANTIBODY". Essentially, it is almost useless for the regulatory authorities, as each sponsor will use different naming for the categories. If we look into the LOINC database, we can also use the value of "Class", which delivers "ALLERGY".
The next one is "LBSPEC" (specimen). It is "permissible", but we usually need it to at least try to uniquely describe the test. There is CDISC controlled terminology for this (codelist SPECTYPE, NCI code C78734) where we find the code "SERUM" (NCI code C119550).
This is very nice, but we also need to take into account that we will need to program this in our mapping scripts, as our computer does not know that "Ser" and "Serum" are the same thing.
The next SDTM variable is "LBMETHOD". Also here, it is subject to CDISC controlled terminology (codelist METHOD, NCI code C85492). This codelist does not only contain lab methods, but any type of methods for different SDTM domains. It does however not contain the term "MULTIDISK" which is clearly the one we need. So we again need to either extend the codelist or do a "new term request" and wait 6 months at least.
We still need to populate "LBORRESU" (Original Result Units). Unfortunately, it is under controlled terminology (i.m.o. a major SDTM design error) by the codelist "UNIT" (NCI code C71620). In this codelist, we don't find our unit "k[IU]/L" as it is UCUM notation and CDISC-CT still refuses to work with UCUM notation. So let us try "kIU/L" which would be the equivalent CDISC notation. No success either. We do however find "kIU/L" as a synonym for "IU/mL", i.e. 1 kIU/L = 1 IU/mL. Fortunately, searching synonym-test code pairs can be automated through a RESTful web service.
So we need to populate LBORRESU with "IU/mL", as the SDTM-IG states that "When sponsors have units that are not in this column, they should first check to see if their unit is a synonym of an existing unit and submit their lab values using that unit" (section 6.3 - Assumptions). Unfortunately, this also means that we loose traceability to the original unit which is "k[IU]/L".
So the mapping between our test with LOINC code 73710-6 and SDTM is not only tedious, but it also leads to a non-unique description of our test and we also loose traceability. We can of course also populate "LBLOINC" with our LOINC code, but this does not liberate us of populating LBTESTCD, LBTEST, etc.
But is this all necessary? Let us do a test, and create an SDTM-LB dataset. Being a rebellion, I put "L73710_6" as the value of LBTESTCD (adding an "L" in front and replacing the dash with an underscore due to the SAS-XPT rules) and add that to the (extended) codelist in my define.xml. For the other things, I do as described above.
And when keeping the mouse over the LBLOINC column:
showing us much more information, provided by a free RESTful web service.
When right-clicking the LBLOINC cell, a RESTful web service (delivered by the US National Library of Medicine) is triggered, popping up a window in our favorite browser, delivering even more information:
One could also submit the LOINC code to the UMLS RESTful web services to find relationships of this test with other tests, diagnoses for diseases (e.g. ICD-10) and much much more, thus building "networks of information and knowledge".
Can you do this with CDISC-CT for LBTESTCD? No way!
We are currently working on an application to generate and display such "networks of information and knowledge" and will later add it to the "Smart Dataset-XML Viewer".
Essentially, when looking at this, it means that when the LOINC code is provided, LBTESTCD, LBTEST, LBSPEC and a number of other SDTM-LB variables are completely unnecessary, as a the LOINC code already contains this information, but even in a better structured and consistent way.
However, the SDTM-IG still forces us to perform the tedious mappings to these (in this case unnecessary) variables. What a waste of time!
So it is really time to rethink the SDTM-LB domain for the case that the LOINC code is available (which will be the case for 95% of the data within the next few years). A first proposal has already been published, which can serve as a discussion start point for a better (or new) SDTM-LB domain.
Conclusions
Mapping from a LOINC code to CDISC controlled terminology can be very challenging. This not only applies to mapping starting from LOINC coding, but also to local lab codes. These problems are not due to the LOINC code system itself, but due to the SDTM controlled terminology being unable to uniquely describe lab tests, and the "reinvention of the wheel" of lab test codes by CDISC.
"In 5 years from now, everything will be e-Source in clinical research" is a statement I often hear. It also means that all our lab tests will be transmitted using LOINC coding. Instead of trying to map these to CDISC-SDTM, which is very tedious, we better should rethink SDTM and especially the LB domain and the controlled terminology for it. A first proposal for an "LB domain for use with electronic health record data" was already published. It can be used as a starting point for a discussion about a better SDTM domain and considerably better SDTM controlled terminology.
P.S. Special thanks to Thierry Lambert for pointing me to a few errors that have now been corrected.