In the last
days, when needing a break from other (paid) work, I programmed a small Java
program that queries the CDISC Library using the RESTful API, and generates CDISC-ODM forms for all domains, scenarios
for the different CDASHIG versions (1.1.1, 2.0 and 2.1).
Additionally, the user can select whether the ODM needs to be completed with all necessary CodeLists for a given CDISC-CT version (default being the latest available CDISC-CT version).
Additionally, the user can select whether the ODM needs to be completed with all necessary CodeLists for a given CDISC-CT version (default being the latest available CDISC-CT version).
The
algorithm I used fully exploits the HATEAOS features of the CDISC Library. It starts with iterating over the available domains of the given CDASHIG version,
then checks within each domain (using the hyperlinks provided in the RESTful
web service response) whether there are "scenarios" for that domain
(which is often the case for "Findings" domains, not for other
domains), and if so, again picking up the links, query for a single scenario
and retrieving the information from the "fields". If there are no
scenarios, the field information can be retrieved directly.
When
a field then has a reference to a codelist, then a new query is performed,
retrieving the codelist information. The NCI code and the name of the codelist
are then combined to generate a CodeList-OID (like "CL.C66742.NY")
and added to the ODM "ItemDef". If the user also wants to have the
complete codelist included, then all items in the codelist are retrieved and
(at the end) transformed to an ODM-XML structure.
As retrieving codelists from the CDISC Library with all their items is somewhat more time consuming, a list of unique codelists is maintained, so that the same codelist must never be queried for twice.
Just as an example, the aforementioned "NY" codelist is referenced 98 times in the CDASHIG 2.1.
As retrieving codelists from the CDISC Library with all their items is somewhat more time consuming, a list of unique codelists is maintained, so that the same codelist must never be queried for twice.
Just as an example, the aforementioned "NY" codelist is referenced 98 times in the CDASHIG 2.1.
In the past, when a new version of a standard like CDASH or SDTMIG came out, it usually
costed me a week of evenings (I still was a professor in medical informatics at
the University of Applied Sciences in Graz at the time), doing copy-paste work from PDF
files, which is not only boring work, but also very error prone. With a little
programming, I can now generate electronic versions of such standards within
minutes.
So, in future, when e.g. a new SDTMIG is published, and it becomes available in the CDISC Library, I can update all my systems in just a few minutes, without the need for any copy-paste.
So, in future, when e.g. a new SDTMIG is published, and it becomes available in the CDISC Library, I can update all my systems in just a few minutes, without the need for any copy-paste.
What is still missing, what should be the next steps?
For
the CDASHIG, the CDISC Library is missing the "Assumptions for the xx
Domain" which essentially is a list of bullets in the PDF version. Would
it be necessary/useful in an electronic version? The text in there isn't really
machine-interpretable at all isn't it? For example, what could a machine do
with an "assumption" like: "As required or defined by the
study protocol, clinically significant results may need to be reported on the
Adverse Event CRF"? The CDISC Library does already contain the
"mapping instructions" for each CDASH field, but only in a
human-readable version, like "This does not map directly to an SDTMIG
variable. For the SDTM submission dataset, concatenate all collected CDASH DATE
and TIME components and populate the SDTMIG variable EGDTC in ISO 8601
format".
For
a computerized system, this is not very helpful, but the following would
probably do:
EGDTC =
(ISO8601)concat(EGDAT,'T',EGTIM)
where the "(ISO8601)" is a cast, as programmers know it.
In order to enable such machine-readable code, we must however agree on a "language", which is hard. As healthcare already seems to be doing, we may in future all agree on using HL7 CQL (Clinical Quality Language).
Another
major problem however is how we publish our draft standards for public review.
CDISC has already moved from PDF to "wiki sites" for this, which are easier to manage by the development
teams, especially in combination with the Jira issue ticketing system. But this does not allow for e.g. "impact
analysis". For example, if a new draft version of SDTMIG or Controlled
Terminology comes out, how can we find out what the impact on our systems will
be? Maybe the new version even has errors and might damage our existing systems
upon an update? At the moment, we cannot do any such impact analysis easily, as
our drafts are published in such a way that they are only usable for human eye
consumption.
Even when a new draft version is published as an Excel file, it still is very difficult to do impact analysis, as it would (and that for each individual reviewer) require several days of programming and applying transformations.
Even when a new draft version is published as an Excel file, it still is very difficult to do impact analysis, as it would (and that for each individual reviewer) require several days of programming and applying transformations.
A
good example where something went pretty wrong in the draft publication
mechanism, is the recent "LOINC to CDISC mapping" for public review. Don't understand me wrong, the CT team did a great job there
(it wasn't easy as the concepts of LOINC codes and CDISC-SDTM-CT are rather
different). The draft was however published as an Excel file, with different
tabs for different (arbitrary) laboratory categories.
Pretty
disastrous was that LBTESTCD (the test code) is missing everywhere. Or was the
idea that people don't need codes anyway, and only want to see text (in
LBTEST)? So, "human eye consumption" only instead of
"machine-readability"?
Anyway, the publication form of the draft makes it pretty hard for me to test it in my SDTM-ETL mapping software, where it really will be of great use!
Anyway, the publication form of the draft makes it pretty hard for me to test it in my SDTM-ETL mapping software, where it really will be of great use!
Just
suppose that every draft of a new CDISC standard version, be it a CDISC model,
IG, controlled terminology, was published before, or synchronized with the "human
eye" presentation (PDF, wiki, Excel), to a "special corner" of
the CDISC Library. Reviewers could then do impact analysis in a fully automated
way by querying the CDISC Library as they do already anyway for the
"final" versions of the standards.
A
nice example is S-Cubed "A3 Community MDR" allowing to compare different CDISC-CT versions in a very quick way. Remark
that the MDR is not based on the CDISC Library yet. A recent testimony from Erin Muhlbradt, our CDISC-CT guru: "I just wanted to say that I used the Community
A3 MDR today to figure out when the MOTEST/TESTCD codelists were deprecated. It
took me all of 7 seconds and about 3 of those seconds were spent figuring out
which buttons to push. This is massively better than the 10s of minutes it
would have taken me otherwise.".
Now
suppose, we could do "diffs" using the A3 Community MDR on the draft
CT version, BEFORE it is published as "final", allowing us to
automate QC, instead of relying on "visual inspection". The reviewers
(and of course the CT team itself too) would then be able to much more easily
find issues, and correct these before final publication. Let's not forget, once
published as "final", there is no way back to correct issues! This
can then only be done in a next version.
Having
draft versions of CDISC standards in a "special corner" of the CDISC
Library would also have the advantage that vendors that have systems based on
CDISC standards (there are many of them) can much more easily start working on
adapting their systems when a new version of the standard is upcoming. With the
current PDF, wiki pages and Excel files, this is barely possible. Vendors could
then submit their findings much better, report ambiguities, make much better
suggestions for improvements, then when only using the "human eye".
This will lead to much higher quality CDISC standards.
Once
published as "final", a vendor could then possibly even deploy the
new version with a single click, or just have the application connect to the
CDISC Library directly, and thus make the new version available to the
customers in a few seconds only. In many systems nowadays, users must wait for
months …
No comments:
Post a Comment