Saturday, November 16, 2019

From FHIR to SDTM-LB (and DM) in just two minutes

The last few days, I worked on two own projects (besides of all the paid projects…): one is an application that searches for EHR records (as HL7-FHIR records) in a public repository that implements the FHIR API, in such a way that the program works independently of exactly which EHR system is queried, and then generates CDISC SDTM-LB and DM datasets from these. I tested the program on the Synthea system, the HAPI-FHIR system, the Vonk system, and am currently looking for more public systems I can test the software on. 

The methodology used is very much based on RESTful web services (RWS). It uses three of them: the RWS to retrieve FHIR data for a given set of laboratory test codes (using LOINC coding), an RWS for getting the mapping between the LOINC code for a lab test and the CDISC-SDTM-LB variable values, and a third one for converting lab units from US conventional units to SI units, using the UCUM notation

The web service for executing the mapping between a LOINC lab test code and the corresponding SDTM variable values is one I developed myself, running on a server somewhere in Vienna. It is not public yet, as the CDISC "LOINC to CDISC mapping" is still under public review, and thus not final yet. If you want to use it though, just send me an e-mail, and I will send you more information about the end point and the methods. As soon as the mapping is officially released, I will make the RWS API public, but will probably retreat it again once the methods are also available over the CDISC Library API. Please also note that this mapping only exists for a very small part of the LOINC database. It is available for the 1400+ most popular (most used) LOINC laboratory codes, whereas LOINC contains over 90,000 codes, and this not only for laboratory tests.

The third RWS for converting US conventional to SI units, and vice versa, is also not public yet, as the code has been donated to the US National Library (NLM) who will implement it as an extension to the existing UCUM conversion RWS (which was also developed by us and donated to the NLM). Also here, if you already want to try it out before the NLM makes it public, just send me an e-mail, and I will send you more information. The method for doing these conversions is based on both UCUM and LOINC. Please note that even simple unit conversions like "inches" to "centimeters" are impossible using the CDISC [UNIT] codelist. So, everything speaks for allowing UCUM notation in CDISC submissions, as that would enable the regulatory authorities such as the FDA to finally be able to compare (lab) values between different studies and submissions, which they can't right now because of the CDISC units. 


Back to our "EHR to CDISC-SDTM" program. It first asks for which EHR system needs to be queried, and then asks for a set of laboratory test codes (as LOINC codes):




It then queries the EHR system, transforms the query results into SDTM records for the given set of LOINC codes, and then also asks whether all the results need to be "standardized" to either SI units or to US conventional units, for the SDTM variables LBSTRESC, LBSTRESN and LBORRESU,


 


and executes the conversions where needed. 

From a generated list of unique patients, it then queries the EHR system again, and generates demographics (DM) SDTM records from the responses.

A short movie of the whole process can be found here.

A snapshot of the dataset in a viewer can be found further below.
We decided to NOT use SAS Transport 5 format (XPT format) for storing the SDTM and DM datasets for a simple reason: we want to keep each FHIR laboratory record WITHIN the SDTM record, which is not possible using XPT format. Therefore, we selected the much better CDISC Dataset-XML format, as this allows to keep additional information (such as the FHIR record) to each SDTM record using simple extension. For example: 




Also, to visualize such FHIR records WITHIN SDTM records, we already extended the open source "Smart Submission Dataset Viewer" software a long time ago. So, if one points the mouse over an LB data point that was generated from an EHR system that supports the FHIR API, the most important information of the FHIR record is displayed:
 


 

As we already showed in the past, the "Smart Submission Dataset Viewer" uses more RESTful web services. For example,  when one points the mouse overa LOINC code, another RWS query is triggered, providing all necessaryinformation about that LOINC code


Our software is just a "pilot-demo" software. Currently,it generates between 100 and 200 SDTM records per second, including the unit conversions for generating the values for LBSTRESC, LBSTRESN and LBSTRESU and the generation of the DM dataset.
It does of course not solve the issues with determining which patient is in which study within the EHR system, nor with authentication and authorization for each individual patient, i.e. whether the system is allowed to retrieve these records anyway.
But also for these, some mechanisms and FHIR resources are available, for example the "ResearchStudy" and "ResearchSubject" resources. That's however material for another blog …
EHR systems do not know CDISC controlled terminology (and they shouldn't). All they know is LOINC codes for tests, and often, for answers that are non-numeric, SNOMED-CT codes. So, if we would like to extend this methodology for more than just lab records (for which there is only a very partial LOINC-to-CDISC mapping), this would require the giant effort to map many more LOINC codes to CDISC variable values. The "low hanging fruit" is surely the set of LOINC codes for vital signs, in order to generate and populate SDTM-VS records fully automatically. Another good candidate is "Questionnaires" as also LOINC codes are available (and are being used in EHRs) for questionnaires. 


The much easier way however would be that there is an SDTM rule that when the LOINC code is populated, there is no obligation to provide xxTESTCD, xxTEST, xxSPEC, xxPOS, xxMETHOD, etc.. After all, only the LOINC code is THE unique identifier for a test in a clinical setting, and not the  combination of any SDTM variables. The tools (such as the "Smart Submission Dataset Viewer" - but anyone can make such tools) can then take care about providing more information about each LOINC code, e.g. using one or more of the existing RESTful web services.