Sunday, June 3, 2018

Source EHR records in SDTM

When submitting SDTM datasets to the FDA, the source data get lost. The reason is that essentially the generation of SDTM datasets is an ETL (extract-transform-load) process. In the case of data collection using CRFs (case report forms), this is compensated for by the FDA requirement that an annotated CRF (in PDF format) is delivered. In the define.xml, (using def:Origin), one then points to the page number(s) or the section (bookmark) of the originating field in the CRF. Such an annotated PDF-CRF must also be delivered when the CRFs were electronic. This doesn't make sense, as in such a case, an SDTM-annotated ODM-XML file would be a much better choice. It has the advantages of being really electronic, and that it can easily be visualized. But the reviewers at the FDA still cannot handle XML well it is still a PDF and XPT world.

[Stylesheet developed by David Iberson-Hurst, Assero]


But what if the source of the data is an electronic health record (EHR)? Of course one can transcribe the data into the CRF, thus again losing the original record. So, what if the reviewer wants to see the real original record? How can this be accomplished in SDTM using the mandatory SAS XPT format?


It can't.


When using CDISC Dataset-XML as the transport format however, much more is possible. Dataset-XML has been developed for transporting tabular data, but as it is based on CDISC ODM, it can also natively transport audit trails, signatures and annotations. Data points in ODM can also carry electronic health records as was several times demonstrated in the past [http://cdisc-end-to-end.blogspot.com/2012/09/electronic-health-records-within-odm.html]. The same is true for Dataset-XML, as technically, there is no difference between an ODM data point and a Dataset-XML data point. 
 


Already some time ago, I extended Dataset-XML to also allow HL7-FHIR "resources", i.e. FHIR-based EHR data (5 minutes work). Yesterday, I extended the popular open-source "Smart Dataset-XML Viewer" for picking up FHIR resources and visualizing them (3 hours work). 

As an example, I took the VS (vital signs) dataset of the FDA pilot study of 2013.  For a few VS records, I added the corresponding FHIR "Observation" source record. Here is how it looks in Dataset-XML:





The FHIR "Observation" resource is embedded in an ODM "ItemGroupData" which corresponds to a single SDTM record. The "Observation" resource further looks like (not everything is shown):




If you would like a copy of this dataset, just drop me ane-mail, and I will be glad to provide it.
Now, how does this look like in the open-source "Smart Dataset-XML Viewer"?
First, we need to remark that each FHIR resource contains a human-readable part (using HTML) and a machine-readable part. For the visualization in the viewer, we selected to only display the human-readable part of the FHIR resource that is what it is for. The machine-readable part is still in the VS file, and could be used by machines.

Here is the result in the viewer:


I programmed it in such a way that when the user holds the mouse over "USUBJID", the FHIR-EHR data point is displayed in a tooltip. Of course, also other types of visualization, such as in a separate window, in the browser, could easily be implemented.
Also remark that in this case, also the age, sex and actual arm of the subject is displayed, another, older (optional) feature of the "Smart Dataset-XML Viewer".


The outdated XPT format does not make it possible to add such additional information. FDA is the only organization in the whole world using (and unfortunately also mandating) this format. Using Dataset-XML, adding such additional information is "piece of cake". Implementing and deploying visualization of a specific one in the generally available, open source, "Smart Dataset-XML Viewer", a matter of hours.

Yet another argument for the FDA to move away from XPT