Friday, May 20, 2016

Electronic Health Record Data in SDTM

The newest publication of the FDA about "Use of Electronic HealthRecord Data in Clinical Investigations" triggered me to pick up the topic of EHRs in SDTM again. The FDA publication describes the framework in which use of EHRs in clinical research is allowed and encouraged. Although it does not contain really new information, it should take the fears of sponsors and investigators away for use of EHR data in FDA regulated clinical trials.

One of the things that will surely happen in future is that the FDA reviewer wants to see the EHR data point that was used as the source of a data point in the SDTM submission. The investigator will then ask the sponsor who will then ask the site...: another delay in bringing this innovatine new drug or therapy to the market. In the mean time patients will die ...

So can't we have the EHR datapoint in the SDTM itself?

Of course! It is even very easy, but only if the FDA would finally decide to get rid of SAS-XPT, this ancient binary format with all its stupid limitations.

Already some years ago, the CDISC XML Technologies Team developed the Dataset-XML standard, as a simple replacement for SAS-XPT. The FDA did a pilot, but since then nothing has happened - "business as usual" seems to have returned.
Dataset-XML was developed to allow the FDA a smooth transition from XPT to XML. It doesn't change anything to SDTM, it just changes the way the data is transported from A to B. However, Dataset-XML has the potential to do things better, as it isn't bound to the two-dimensional table approach of XPT (which again forces SDTM to be 2-dimensional tables).

So, let's try to do better!

Suppose that I do have a VS dataset with a systolic blood pressure for subject "CDISC01.100008" and the data point was retrieved from the EHR of the patient. Forget about adding the EHR data point in the SDTM using ancient SAS-XPT! We need Dataset-XML.

This is how the SDTM records look:

Now, the EHR is based on the new HL7-FHIR standard, and the record is very similar to the one at  How do we get this data point in our SDTM?

Dataset-XML, as it is based on CDISC ODM, is extensible. This means that XML data from other sources can be embedded as long as the namespace of the embedded XML is different from the ODM namespace. As FHIR has an XML implementation, the FHIR data point can easily be embedded into the Dataset-XML SDTM record.

In the following example (which you can download from here), I decided to add the FHIR-EHR data point to the SDTM record, and not to VSORRES (for which one could plead), as I think that the data point belongs to the record, and not to the "original result" - we will discuss this further on.

The SDTM record then becomes:

Remark that the "Observation" element "lives" in the HL7 namespace "".

continued by:

Important here is that LOINC coding is used for an exact description of the test (systolic, sitting - LOINC code 8459-0), and that SNOMED-CT is used for coding the body part. This is important - the SDTM and CT teams are still refusing to allow the LOINC code to be used as the unique identifier for the test in VS and LB. Instead, they reinvented the wheel and developed their own list of codes, leading to ambiguities. LOINC coding is mandated to be used in most national EHR systems, including the US Meaningful Use. The same applies to the use of UCUM units.

Now, if you inspect the record carefully, you will notice that a good amount of the information is present twice. The only information that is NOT in the EHR datapoint is STUDYID, USUBJID (although,..), DOMAIN, VISITNUM, VISITDY (planned study day) and VSDY (actual day). STUDYID is an artefact of SAS-XPT, as ODM/Dataset-XML could allow to group all records per subject (using ODM "SubjectData/@SubjectKey). DOMAIN is also an artefact, as within the data set, DOMAIN must always be "VS" and is given by the define.xml anyway with a reference to the correct file.VSDY is derived and can easily be calculated "on the fly" by the FDA tools. Even VSSEQ is artificial and could easily be replaced by a worldwide unique identifier (making it worldwide referenceable, as in ... FHIR). VISIT (name) is also derived in the case of a planned visit and can be looked up in TV (trial visits).

So, if we allow Dataset-XML to become more-dimensional (grouping data by subject), the only SDTM variables that explicitely need to be present are VISITNUM and VISITDY. So essentially, our SDTM record could be reduced to:


Remark the annotations I made, making the mapping to SDTM variables.

If the reviewer still likes to see the record in the classic two-dimensional table way, that's piece of cake, an extremely simple transformation (e.g. using XSLT) does the job.

Now, reviewers always complain about file sizes (however, reviewers should be forbidden to use "files"), and will surely do when they see how much "size" the FHIR information takes. But who says that the FHIR information must be in the same file? Can't it just be referenced, or better, can't we state where the information can be found using a secured RESTful web service?
This is done all the time in FHIR! So we could further reduce our SDTM record to:

Remark that the "http://..." is not simply an HTTP addres: just using it in a browser will not allow to obtain the subject's data point. The RESTful web service in our case will require authentication, usually using the OAuth2 authenticion mechanism.

Comments are very welcome - as ever ...


  1. Jozef, a very good article. I agree that we can use xml rather than xpt files. xml will be more useful and much easier to integrate a lot of different data as you show it in example. A very clever idea of using REST API!!!

  2. This comment has been removed by a blog administrator.