The methodology used is very much based on RESTful web services (RWS). It uses three of them: the RWS to retrieve FHIR data for a given set of laboratory test codes (using LOINC coding), an RWS for getting the mapping between the LOINC code for a lab test and the CDISC-SDTM-LB variable values, and a third one for converting lab units from US conventional units to SI units, using the UCUM notation.
The web service for executing the mapping between a LOINC lab test code and the corresponding SDTM variable values is one I developed myself, running on a server somewhere in Vienna. It is not public yet, as the CDISC "LOINC to CDISC mapping" is still under public review, and thus not final yet. If you want to use it though, just send me an e-mail, and I will send you more information about the end point and the methods. As soon as the mapping is officially released, I will make the RWS API public, but will probably retreat it again once the methods are also available over the CDISC Library API. Please also note that this mapping only exists for a very small part of the LOINC database. It is available for the 1400+ most popular (most used) LOINC laboratory codes, whereas LOINC contains over 90,000 codes, and this not only for laboratory tests.
The third RWS for converting US conventional to SI units, and vice versa, is also not public yet, as the code has been donated to the US National Library (NLM) who will implement it as an extension to the existing UCUM conversion RWS (which was also developed by us and donated to the NLM). Also here, if you already want to try it out before the NLM makes it public, just send me an e-mail, and I will send you more information. The method for doing these conversions is based on both UCUM and LOINC. Please note that even simple unit conversions like "inches" to "centimeters" are impossible using the CDISC [UNIT] codelist. So, everything speaks for allowing UCUM notation in CDISC submissions, as that would enable the regulatory authorities such as the FDA to finally be able to compare (lab) values between different studies and submissions, which they can't right now because of the CDISC units.
Back to our
"EHR to CDISC-SDTM" program. It first asks for which EHR system needs
to be queried, and then asks for a set of laboratory test codes (as LOINC
codes):
It then queries the EHR system, transforms the query results into SDTM records for the given set of LOINC codes, and then also asks whether all the results need to be "standardized" to either SI units or to US conventional units, for the SDTM variables LBSTRESC, LBSTRESN and LBORRESU,
and
executes the conversions where needed.
From a
generated list of unique patients, it then queries the EHR system again, and
generates demographics (DM) SDTM records from the responses.
A short movie of the whole process can be found here.
A snapshot of the dataset in a viewer can be found further below.
We
decided to NOT use SAS Transport 5 format (XPT format) for storing the SDTM and
DM datasets for a simple reason: we want to keep each FHIR laboratory record
WITHIN the SDTM record, which is not possible using XPT format. Therefore, we
selected the much better CDISC Dataset-XML format, as this allows to keep additional information
(such as the FHIR record) to each SDTM record using simple extension. For
example:
Also, to visualize such FHIR records WITHIN
SDTM records, we
already extended the open source "Smart Submission Dataset Viewer" software a long time ago. So,
if one points the mouse over an LB data point that was generated from an EHR
system that supports the FHIR API, the most important information of the FHIR
record is displayed:
As we already showed in the past, the "Smart Submission Dataset Viewer" uses more RESTful web services. For example, when one points the mouse overa LOINC code, another RWS query is triggered, providing all necessaryinformation about that LOINC code.
Our
software is just a "pilot-demo" software. Currently,it generates
between 100 and 200 SDTM records per second, including the unit conversions for
generating the values for LBSTRESC, LBSTRESN and LBSTRESU and the generation of
the DM dataset.
It does of course not solve the issues with determining which patient is in which study within the EHR system, nor with authentication and authorization for each individual patient, i.e. whether the system is allowed to retrieve these records anyway. But also for these, some mechanisms and FHIR resources are available, for example the "ResearchStudy" and "ResearchSubject" resources. That's however material for another blog …
It does of course not solve the issues with determining which patient is in which study within the EHR system, nor with authentication and authorization for each individual patient, i.e. whether the system is allowed to retrieve these records anyway. But also for these, some mechanisms and FHIR resources are available, for example the "ResearchStudy" and "ResearchSubject" resources. That's however material for another blog …
EHR systems
do not know CDISC controlled terminology (and they shouldn't). All they know is
LOINC codes for tests, and often, for answers that are non-numeric, SNOMED-CT codes. So, if we
would like to extend this methodology for more than just lab records (for which
there is only a very partial LOINC-to-CDISC mapping), this would require the
giant effort to map many more LOINC codes to CDISC variable values. The "low hanging fruit" is surely the set
of LOINC codes for vital signs, in order to generate and populate SDTM-VS
records fully automatically. Another good candidate is "Questionnaires"
as also LOINC codes are available (and are being used in EHRs) for
questionnaires.