CDISC end-to-end: September 2020

In clinical research submission standards, most of our controlled terminology (CDISC-CT) is "post-coordinated", i.e. it is used for categorization of data AFTER it was captured. This means that we look what was captured ("original result"), and then categorize that using a term from the CT in CDISC-SDTM (or SEND).
An example is that when I do have a data point that is about "Albumin in urine", I categorize that in SDTM in the LB domain (another sometimes difficult decision I have to make) as LBTESTCD="ALB", LBTEST="ALBUMIN", LBSPEC="URINE".

Of course, we do have exceptions in our CDISC-CT. We can very well use the terms for "Adverse Event Severity" controlled terminology (AESEV in SDTM): "MILD", "MODERATE" and "SEVERE" well in advance, before data capture starts, as three checkboxes or radiobuttons on the (e)CRF.
I haven't however seen a CRF yet, where data capture instructions are based on LBTESTCD/LBTEST/LBSPEC.

The whole medical world however is using pre-coordinated CT. When a medical doctor orders a lab test, he/she checks a checkbox on my form or Hospital Information System (HIS) screen for which there is a LOINC code behind it. When a microbiology test is ordered, it is done using a LOINC code, or maybe a CPT code. Although he/she does not see this immediately, the results (when not a quantity) are coded using SNOMED-CT.

Pre-coordination has a lot of advantages. It takes care that exact instructions are provided about what has to be done. When I provide the LOINC code 1754-1 for a test to be performed, the lab and everyone else knows exactly what to do, i.e. it immediately knows that it has to measure the albumin mass concentration in single-sample urine (i.e. not collected over a period of time) and has to report that as a mass concentration like "g/dL". If I would like the lab to report it in SI units, I would do the request with LOINC code 77158-4. If I would like to have the concentration measured in urine collected over a time period of 24 hours, I would use LOINC code 21059-1. I would only want to know whether there is albumin present in the urine, I would use LOINC code 1753-3.
So, essentially, when using pre-coordination, I am planning well in advance, and I exactly know what I am doing. There is no ambiguity.
No wonder that pre-coordinated codes like LOINC are used in so many countries (over 180 in the case of LOINC, and is mandated by law for use in healthcare in many of them.

Unfortunately, in clinical research, we mostly do not plan very precise in advance. In clinical research protocols (the "study plan"), we still see wording like "measure albumin in urine". I haven't seen many protocols where LOINC codes are provided, exactly defining what has to be tested. It is then usually up to the sponsor to define what "measure albumin in urine" exactly means for the site instructions, but also in these, LOINC coding is almost not used: mostly the site instructions are plain text which can easily be misinterpreted. So, don't be surprised when one site measures albumin in urine quantitatively, and another just the presence, some of them in single-sample urine, other in urine that was collected over a 24 hours period of time. One can already guess that such results are hard to compare.

BUT … we then still have post-coordinated CDISC-CT to repair for not planning precisely, and to categorize all our results using CT for -TESTCD, -TEST, -SPEC, -LOC, -METHOD variables. This categorization step is performed by "mappers" at CROs, specialized companies and sponsors, and easily can take several weeks or in worst cases even months.
So, essentially, when using post-coordination, I am not planning not well in advance, I do not know what I will get, but … I will (try to) repair afterwards.

So, why don't we use more pre-coordination CT in clinical research?
Personally, I think this is for 50% or more for historical reasons: protocols traditionally do not use coding, but only plain text to describe what needs to be done. Furthermore, CDISC decided not to use LOINC for describing test in the early days, in my opinion on the wrong assumptions (unfortunately, that article is not on the CDISC website anymore): one of them is that (local) labs do not know/use LOINC coding, and would need to learn it. Well, they do not know CDISC-CT either isn't it? Also, any lab that does not use LOINC nowadays will soon be out of business, as it is mandated through so many regulations, worldwide. Similar is true for SNOMED-CT: the argument that SNOMED is not free in some parts of the world is not precise: correct is that use of SNOMED for research purposes is free everywhere in theworld: "… is free … for Qualifying Research Projects in any country". At the moment, SNOMED-CT is only used in SDTM for some parameters in the TS (Trial Summary) domain, as mandated by the FDA. It would e.g. be enormously useful for exactly describing in tests detected bacteria, viruses and other microorganisms.

Another reason why pre-coordinated CT is not much used within CDISC is that it is not mandated by the FDA. LOINC only got serious attention within CDISC after the FDA mandated the use of LOINC codes in theSDTM-LB (Laboratory) domain. LOINC is however much more than lab tests… As the SDTM "Findings" domains, including LB, all rely on post-coordination for the tests, a LOINC-to-LB mapping effort started, which took a lot of resources, and took years (at the moment of writing, the final version still hasn't been published), and only for the 2000+ most popular LOINC-lab test codes. As the LOINC code contains an exact description of the test, why was this mapping necessary? Just because LBTESTCD/LBTEST are "required" variables in SDTM?

The relation between FDA and CDISC is not always an easy one. The FDA requirement for LOINC coding in SDTM-LB came as a "bolt from the blue", also for CDISC. Given several other initiatives at the FDA, like in the area of "Real World Data" (RWD) and "Real World Evidence" (RWE), and the FDA also talking to other SDOs, I would not be surprised when FDA adds new requirements, having to do with pre-coordinated CT, in the near future. Obvious candidates are:

LOINC codes for (COVID-19) microbiology test, vital signs and ECG test
Use of UCUM notation for units for quantitative test results
Use of SNOMED-CT codes for identification of bacteria and viruse
Use of SNOMED-CT (or ICD-10) codes for diagnoses in MH (Medical History) domains

When this happens (which I expect) will we then start giant mapping efforts again to our post-coordinated CT?

What can we do better?

Suppose I need to find COVID-19 patients as volunteers for a study for a new treatment in an HIS or Electronic Health Record (EHR) system. Of course, I can start doing a text search, but isn't it smarter to use the SNOMED-CT diagnosis code (84053900) or the ICD-10 code (U07.1). I could also use the SNOMED-CT code for the virus itself, which is 840533007.

Surprisingly, none of these is even mentioned in the new CDISC "Interim User Guide for COVID-19". Also, none of the newly published LOINC codes for Coronavirus testing can be found in this publication. Using the pre-coordinated codes, it is much easier to find and select eligible candidates for a study.

At the upcoming CDISC 2020 US Interchange I will demonstrate how the use of pre-coordinated CT, together with newly developed mappings between LOINC codes for COVID-19 and SDTM-MB CT, allows to generate submission-ready SDTM datasets, starting from an EHR system in just minutes, a process that usually takes weeks or months when going through the traditional post-study "mapping" process using post-coordinated CDISC-CT. Wouldn't the generation of submission datasets from EHRs in just minutes be the dream of every sponsor (and regulator)?

In order to be much more productive in clinical research, and obtain a much higher data quality (the classic "mapping" process is error-prone), there is no other way than starting to use pre-coordinated CT like LOINC, SNOMED-CT, ICD-10, …

First of all, we need to promote that pre-coordinated CT (LOINC, SNOMED-CT, ICD-10 …) is used in clinical research protocols, or at least that these flow into the site instructions or even into the eCRFs (this is easily possible using CDISC-ODM). Prototype tools for "annotating" protocols with such codes in a semi-automated way have been developed in the past, but were not developed further due to lack of interest from industry. As long as we do not provide codes for exact definition of tests to be performed in a study right from the start, we will get a multitude of different test and results that are mostly incomparable, and that painfully need to be mapped to post-coordinated CDISC-CT.

When going over all the published "Therapeutic AreaUser Guides", I found almost no LOINC nor SNOMED-CT codes. Of course, CDISC does not (and does not want to) impose what exactly needs to be measured, but in the case of therapeutic areas, the authors of these guides should at least make suggestions for what is best practice, and at least suggest exact LOINC codes for these. Stating "a PCR test to detect presence of Corona virus" is just not good enough …

If, and that is what I expect, especially due to increased use of EHRs as the primary source, we will use much more pre-coordinated CT for querying (there is no EHR system that uses CDISC-CT), this would also mean that we would need to develop additional LOINC-SDTM mappings, which is an enormous task. Unofficial mappings forCOVID-10 microbiology tests and for vital signs, as well as for ECG test are already available, or are currently being developed. This is however the work of a few "geeks" or done at sponsor companies by a few "visionaries", and not formally supported by CDISC.

LOINC however has >93,000 codes, SNOMED-CT has >100,000 unique concepts, and (unlike CDISC-CT), has explicit relations between the codes. Mapping all these to post-coordinated CDISC-CT would be an almost impossible task, which I think cannot be automated. For example, for each LOINC code, it first needs to be decided to which SDTM domain the finding for the test belongs. This would require consensus … and thus time …

But is it really necessary to develop such mappings?

The LOINC-SDTM-LB mapping was an enormous and incredible effort, the process took years. I have a lot of respect for the team that accomplished this. The result is however limited to the 2000+ most used LOINC lab tests. So, what about the 90,000+ others?

Some people suggested to have it allowed that the LOINC code is put in –TESTCD. This is however technically not allowed, as LOINC codes start with a numeric character and this is not allowed in SDTM (SAS-XPT legacy). It wouldn't be correct either, as e.g. in LB, LBTESTCD does not represent a test, it only represents the analyte (remark: in each "Findings" domain -TESTCD means something different…). Others have proposed an "alternative" LB domain, adapted for the use of LOINC, but this was never seriously taken into consideration.

As developing such mappings require enormous efforts, and can never be complete (new LOINC and SNOMED-CT codes are published all the time), wouldn't it be better to, when the test was uniquely be described by the LOINC code), to just populate --LOINC (in any of the "Findings" domains), and just leave --TESTCD, --TEST, --SPEC, --METHOD, etc. blank? This is however not allowed, as --TESTCD and --TEST are "required" variables in all domains. However, the LOINC code is exactly describing the test, whereas the combination of -TESTCD/-TEST/-SPEC/-METHOD is not. So, shouldn't we consider the LOINC code as the "primary" identifier, and --TESTCD etc. as the "secondary", the latter just for "ease of review" of regulatory reviewers, as these do not know the meaning of each LOINC code? Even that is not necessary, as we nowadays have RESTful web services, that can easily be used in any modern software, for looking up the meaning of a LOINC code fully automatically.
For example, in the "Smart Submission Dataset Viewer":

Even allowing to automatically open an NLM web page with more information about the test, like:

So, by just using the LOINC code, review tools can connect to any resources related to the LOINC code through a RESTful web service. Of course, these could also be FDA-internal RESTful web services, for example a service that displays a distribution of all values for that test available to the FDA, e.g. for that specific disease. Or it could be used to connect to Real-World-Data resources.

In such cases, there is no need anymore for --TESTCD, --TEST, --SPEC, etc. So why waste resources for developing mappings to post-coordinated CDISC-CT when we can do everything using pre-coordinated CT and its derivatives?

So, here comes my suggestion, and I guess already that I will be strongly criticized for it, but I stay to my point: In SDTM Findings domains, when the LOINC code is provided, --TESTCD, --TEST, --SPEC, --METHOD and other "identifiers" (like --CAT), do not need to be populated.

Time for the "rotten tomatoes". Please feel free to throw and comment!

CDISC end-to-end

Monday, September 7, 2020

Post-coordinated versus Pre-coordinated Controlled Terminology