Friday, September 30, 2016

Generating Define-XML: new software

Although my posting "Creating define.xml - best and worst practices" is the most read blog entry of my blog site, people seem not to learn (or maybe do not want to learn). Almost daily, I read complaints of people who use "black box software" for generating define.xml that is free of charge, starting from Excel worksheets, and not getting the result they would like to obtain, at least not when viewing the define.xml visualized by the stylesheet. Some even do not realize that what they see in the browser is not the define.xml, but only a visualization of it.

There does not seem to be much user-friendly software on the market for generating or working with Define-XML. As already explained in the above-mentioned blog, the best way to generate a define.xml is "upfront" and a few software software packages for mapping to SDTM and at the same time generating a good define.xml exist. One of them is my own SDTM-ETL software.

For people that cannot use this approach (e.g. for legacy studies), there was not much out there yet. They usually used the "Excel" approach, often leading to very bad results.

We recently released a new software named the "ODM Study and Define,xml Designer 2016", which can be used for both setting up study designs in ODM format and for generating define.xml files. For the latter, 4 use cases are supported:

  • creating a define,xml from scratch
  • creating a define.xml starting from an SDTM template (SDTM 1.2, 1.3 or 1.4 - SDTM-IG 3.1.2, 3.1.3 or 3.2)
  • creating a define.xml starting from a set of SAS-XPT files
  • starting from an incomplete define.xml file, e.g. generated by other tools
In all cases, the user can choose between define.xml v.1.0 and v.2.0. Also the upcoming v.2.1 will be supported as soon as it published by CDISC. The user can also choose between all CDISC controlled terminology versions that were released since 2013.

Unlike the "black box tools", the software comes with a very nice graphical user interface, has very many wizards, and performs validation using validation rules developed by CDISC. For example for generating the define.xml v.2.0 "Where Clauses", there is a wizard:

making it extremely easy to develop "where-clauses". 

At each moment during the process, the user can inspect the generated define.xml, either as XML, as a tree structure, or visualized in the user's favorite browser, and using the default CDISC stylesheet or using an own stylesheet. 

The validation features go beyond anything else that is currently available, and can be done on different levels. Moreover, unlike with other tools, no false positive errors are generated. This is due to the fact that the developer of the software (well, that's me, a CDISC volunteer for 15 years now) is one of the co-developers of the Define-XML standard, and a CDISC authorized Define-XML trainer (I give most of the CDISC Define-XML trainings in Europe), and thus knows every detail of the standard.

The software is not free-of-charge, but it is not expensive either. So, there is now no excuse anymore for generating bad define.xml files!

Information, including a user manual, can be found at: 

Thursday, September 15, 2016

FDA and SAS Transport 5 - survey results

As promised on LinkedIn, I analyzed the results of the survey where people were asked the question "In my opinion, the FDA should ..." with the following possible answers:
  • Continue requiring SAS-Transport-5 (XPT) as the transport format 
  • Move to XML (e.g. CDISC Dataset-XML) as soon as possible
  • Move to RDF as soon as possible
  • Other 
People were also asked who they are working for: Pharma Sponsor,  CRO, Service Provider, Software Company, or Other.
We had 57 answers (which is considerably less than I had hoped for). Here are the first results:

with a relative good distribution between all groups (some ticked more than 1 box), with a slight overrepresentation of pharma sponsors (which isn't a surprise as they do the FDA submissions).

And here come the results about the question on the exchange format:

Over 50% voted for moving to an XML-based format like CDISC-Dataset-XML, about 25% for moving to RDF. A minority of less than 20% voted for continuation of the current FDA policy to require SAS Transport 5.

I tried to make a detailed analysis looking for relations between the answer about the preferred format and the company type, but didn't find any. The only slight trends I could see (but statistically not significant at all) is that RDF is a bit overrepresented in the "Sponsor" group, and that "SAS-Transport-5" is slightly overrepresented in the "CRO" group. Only 3 (out of the 20) "sponsor voters" voted for "Continue requiring SAS-Transport-5".

The survey also allowed to provide comments. Here are the most interesting ones:
  • If it's not broken, don't fix it.  Pharma is a big industry and slow to change/adapt 
  • We must move beyond the restrictive row/column structure
  • SDTM is useless and error prone. We need modern data models and semantics
  • Consider JSON also. Get rid of Supplemental domains
  • Going for RDF means that ADaM, SDTM and the rest could be all linked together ...
If anyone would like to analyze the results in more detail, just mail me and I can send the complete results as a spreadsheet or as CSV or similar.