Monday, March 28, 2016

FDA SDTM validation rules, XQuery and the "Smart Dataset-XML Viewer"


During the last days, I could again make considerable progress in writing FDA and CDISC SDTM and ADaM validation rules in the vendor-neutral XQuery language (a W3C standard).

With this project, we aim to:
  • come to a real vendor neutral, as well human-readable as machine-executable set of validation rules (no black-box implementations anymore)
  • have rules that are easily readable by persons in the CDISC community, and commented on
  • develop rules that do not lead to false positives
  • come to a reference implementation of the validation rules, meaning that, after acceptance by CDISC, other implementations (e.g. from commercial vendors) always need to come to the same result for the same test case
  • make these rules available by CDISC SHARE for applications and humans, by using RESTful web services and the SHARE API
I was now also able to implement these rules in the "Smart Dataset-XML Viewer":


The set of rules itself is provided as an XML file, for which we have already a RESTful web service for rapid updates, meaning that if someone finds a bug or an issue with a rule implementation, it can updated within hours, and the software can automatically retrieve the corrected rule implementation of the rule (no more waiting for the next software release or software bug fix).

In the "Smart Dataset-XML Viewer", the validation is optional, and when the user clicks the button "Validation Rules Selections", all the available rules are listed, and can be selected/deselected, meaning that the user (and not the software) decides for which rules the submission data sets are validated:


Some of these rules use web services themselves, for example to detect whether an SDTM variable is "required", "expected" or "permissible", something that cannot be obtained from the define.xml.
A great advantage is that any rule violations are immediately visible in the viewer itself, i.e. the user does not need to retrieve the information from an Excel file anymore and then look up the record manually in the data set.


At the same time, all violations are gathered into an XML structure, which can easily be (re)used in other applications (we do not consider Excel as a suitable information exchange format between software applications).


And even better, all this is real "open source" without any license or redistribution limitations, so that people can integrate the "Smart Dataset-XML Viewer", including its XQuery validation, into any other application, even commercial ones.


I am currently continueing working on this implementation, and on the validation rules in XQuery. I did most of the FDA-SDTM rules (well, at least those that are not wrong,ununderstandable or an expectation rather than a rule).
I also did about 40% of the ADaM 1.3validation checks, and will start on the CDISC SDTM conformance rules as soon as they are officially published by CDISC.
I can however use help with the ADaM validation rules, as I lack some suitable real-life test files. So if you do ADaM validation in your company and have some basic XQuery knowledge (or willing to acquire it), please let me know, so that we can make rapid progress on this.
Another nice thing about having the rules in XQuery is that companies can easily start developing their own sets of validation rules in this vendor-neutral language, be it for SDTM, SEND or ADaM, and just add them to a specific directory in the "Smart Dataset-XML Viewer", after they will immediately become available to the viewer.


I hope to make a first release on SourceForge (application + source code) in the next few weeks, so stay tuned!