During the last days, I could again
make considerable progress in writing FDA and CDISC SDTM and ADaM validation rules in the vendor-neutral XQuery language (a W3C
standard).
With this project, we aim to:
- come to a real vendor neutral, as well human-readable as machine-executable set of validation rules (no black-box implementations anymore)
- have rules that are easily readable by persons in the CDISC community, and commented on
- develop rules that do not lead to false positives
- come to a reference implementation of the validation rules, meaning that, after acceptance by CDISC, other implementations (e.g. from commercial vendors) always need to come to the same result for the same test case
- make these rules available by CDISC SHARE for applications and humans, by using RESTful web services and the SHARE API
I was now also able to implement these
rules in the "Smart Dataset-XML Viewer":
The set of rules itself is provided as
an XML file, for which we have already a RESTful web service for rapid updates, meaning that if someone finds a bug or an issue
with a rule implementation, it can updated within hours, and the
software can automatically retrieve the corrected rule implementation
of the rule (no more waiting for the next software release or
software bug fix).
In the "Smart Dataset-XML Viewer",
the validation is optional, and when the user clicks the button
"Validation Rules Selections", all the available rules are
listed, and can be selected/deselected, meaning that the user (and
not the software) decides for which rules the submission data sets
are validated:
Some of these rules use web services themselves, for example to detect whether an SDTM variable is "required", "expected" or "permissible", something that cannot be obtained from the define.xml.
A great advantage is that any rule violations are immediately visible in the viewer itself, i.e. the user does not need to retrieve the information from an Excel file anymore and then look up the record manually in the data set.
I am currently continueing working on this implementation, and on the validation rules in XQuery. I did most of the FDA-SDTM rules (well, at least those that are not wrong,ununderstandable or an expectation rather than a rule).
A great advantage is that any rule violations are immediately visible in the viewer itself, i.e. the user does not need to retrieve the information from an Excel file anymore and then look up the record manually in the data set.
At the same time, all violations are
gathered into an XML structure, which can easily be (re)used in other
applications (we do not consider Excel as a suitable information
exchange format between software applications).
And even better, all this is real "open
source" without any license or redistribution limitations, so
that people can integrate the "Smart Dataset-XML Viewer",
including its XQuery validation, into any other application, even
commercial ones.
I am currently continueing working on this implementation, and on the validation rules in XQuery. I did most of the FDA-SDTM rules (well, at least those that are not wrong,ununderstandable or an expectation rather than a rule).
I also did about 40% of the ADaM 1.3validation checks, and will start on the CDISC SDTM conformance rules
as soon as they are officially published by CDISC.
I can however use help with the ADaM validation rules, as I lack some suitable real-life test files. So if you do ADaM validation in your company and have some basic XQuery knowledge (or willing to acquire it), please let me know, so that we can make rapid progress on this.
Another nice thing about having the rules in XQuery is that companies can easily start developing their own sets of validation rules in this vendor-neutral language, be it for SDTM, SEND or ADaM, and just add them to a specific directory in the "Smart Dataset-XML Viewer", after they will immediately become available to the viewer.
I can however use help with the ADaM validation rules, as I lack some suitable real-life test files. So if you do ADaM validation in your company and have some basic XQuery knowledge (or willing to acquire it), please let me know, so that we can make rapid progress on this.
Another nice thing about having the rules in XQuery is that companies can easily start developing their own sets of validation rules in this vendor-neutral language, be it for SDTM, SEND or ADaM, and just add them to a specific directory in the "Smart Dataset-XML Viewer", after they will immediately become available to the viewer.
I hope to make a first release on
SourceForge (application + source code) in the next few weeks, so
stay tuned!