Why?
First of all, FDA and PMDA still require us to submit in the outdated, highly inefficient XPT format. Additional, so-called "non-standard" variables need to be submitted in "SUPPxx" datasets that are even more inefficient. If they would be kept in the "parent" dataset, and marked as "non-standard" the size of the whole submission could be 50% less.
Amazon, Google, Twitter handle volumes of information that is millions times the size of our regulatory submissions. They do not complain about file sizes. Why?
The answer is simple: because they do not use files.
In these organizations, all information is stored in databases, and RESTful web services are used to exchange information between machines.
The define.xml contains the metadata of a submission. It is relatively small, usually somewhat between 0.5 and 1 MB large. It is the "sponsor's thruth about the submission", containing almost all the information for the reviewer to be able to work with the data.
The define.xml also contains the location of the data - as file references. It are these data files that can become very large. Unlike Amazon, Google and Twitter, we (clinical research) still use "files" for exchanging the information. The rest of the world is using RESTful web services.
Imagine that the sponsor keeps the study information in a system (which usually is a database), and provides an "SDTM view" of the data. Yes, SDTM is a "view" on the real data! It is just an ETL view on the data in order to make it easy for the reviewers to understand and work with the data. In conflict with good database practice, it contains a lot of redundancy and derived data, even though this is often completely unnecessary. But it is all "for the sake of easy review".
So, suppose that a sponsor can provide an "SDTM view" on the data "on the fly", and that this "view" is available through a RESTful web service.
The define.xml entry for the "dataset" DM can then look like:
This API can then easily be extended, e.g. with filtering criteria, like:
[base]/Standard/SDTM/Study/cdisc01/Dataset/DM?USUBJID=CDISC01.10008
where "[base]" is the base of the restful web service (in our simple example "https://mypharmacompany/submissions/REST" and the question mark "?" is a "where" statement. So the REST string then means: provide the DM data of the subject with USUBJID "CDISC01.10008".
This can of course easily be expanded to any of the variables and datasets. For example, to obtain all the AE records for severe advents that lead to hospitalization, the string would be:
[base]/Standard/SDTM/Study/cdisc01/Dataset/AE?AESEV=SEVERE&AESHOSP=Y
Simple, isn't it? No files, anymore, just services and queries ...
How this is implemented on the server is completely unimportant. The API is essentially the "service contract" guaranteeing that the service is exactly provided what is asked.
Now, what would be the consequences of basing electronic submissions on RESTful web services?
- Only a define.xml file need to be submitted (0.5-1MB). Even for that, a RESTful web service could take care of.
- No files are needed anymore at the regulatory authorities side anymore. They can however still store the information if they want, as the usual formats for returning the information from RESTful web services are XML, JSON and Turtle (RDF), which can all be used to populate databases and data warehouses.
- At the sponsor's site, only an SDTM/SEND/ADaM database is necessary (which is already there anyway), only the API implementation needs to be taken care of
- No 8-, 40- and 200-character limitations anymore of the XPT file format, as files are not used anyway. No problems with non-ASCII characters, as the RESTful web services uses XML or JSON.
- No need for SUPPxx "files" - there are no files. SUPPxx datasets should also not be necessary, non-standard variables (NSVs) can just be marked as such in the define.xml.
- The SDTM data can contain the real source data (points), like one or more FHIR resources of the electronic health record that was the source of the data.
- Validation of data (for compliance with the standard) can also be done using RESTful web services. As the software for validation is server based, bug fixes can be done within hours, instead of having to wait for years (current validation software of the FDA).
- Review can already start before the last data point is captured! Especially SDTM and SEND datasets are usually already assembled far earlier than before the last data point is captured, and as the SDTM database is already there anyway, regulatory authorities could already start doing some of the review before the study ends, and then after database closure, just repeat the analyses they did already before on the partial data. This can save the life of thousands of patients each year who are waiting for a new medication or treatment.
- Easy filtering and querying: as filtering can already be done by the RESTful web service itself, it makes life much easier for the reviewer, as he/she does not need to learn the filtering features of different packages
- Much easier and faster collaboration between reviewer and submitting pharma company. If the reviewer requires the information to be updated or differently organized, there is no need anymore to start re-generating exchanging files. Just implement the necessary change in the database, and it is there.
Electronic submissions to the regulatory authorities based on only one small (define.xml) file. Doesnt't this sound cool? Such an approach could save the life of thousands of patients each year!