Sunday, September 14, 2025

SDTM Mapping without Excel - A Comparison of Software Offerings

Certara, who acquired Pinnacle21 a few years ago, seems to have started a marketing offensive for Pinnacle21 "Enterprise" (P21E), now claiming "Faster Mapping from Source to Target SDTM" and "without spreadsheets" (as the latter is the classic approach for setting up and managing the mapping specifications).
As our own SDTM-ETL (competing) mapping software provides this already for about 10 years, I was pretty curious, so I attended their 35-minutes webinar.

After a few pure marketing slides, it was explained what "Methods and benefits of cross-study reuse" are. Ok, you don't need to convince me about that. Also this is something our SDTM-ETL software provides already for a long time.

 

(the image is from a presentation of a customer in 2015)

It was then shown what the "without spreadsheet" mapping approach in P21E is. To my surprise, it looks as the approach reduces to have the "external" spreadsheet being replaced by an "internal" spreadsheet, which then needs to be adapted for each new study from within the software, which, in my opinion, creates a new (vendor) dependency.
It also uses "predefined specs" for each of the SDTM versions. It was unclear to me whether these can be edited or corrected when something is wrong. In our software we call these "templates" and are delivered as define.xml files, meaning that if something is wrong, it can easily be corrected. In SDTM-ETL, the use of define.xml for the templates also enables to generate a "near-submission-ready" define.xml, as all the metadata is kept in sync with an underlying instance of the define.xml during the mapping development
From the P21E demo, it also looks as for the SDTM variables, the maximal length needs to be manually adapted for each new study - I did not hear anything about an automated adaption from the data themselves, as we have in our software.

Most disappointing however was that the mappings are still kept in tables, just as it was in the past, but then tables within the software instead of external Excel files. So, no user-friendly graphical user interface, no "drag-and-drop" from source data, no wizards e.g. for codelist-codelist mapping, no automated generation of the mapping script. None of all that, at least from what was presented.

It also looks to me as it does not have its own execution engine, but still completely relies on either SAS (expensive....) or R for the execution. This also means that the software still requires very good knowledge of either SAS or R on top of very good knowledge of SDTM, for which in our SDTM-ETL there is a lot of aid by wizards, and e.g. all the "CDISC Notes" available by a simple click or "Ctrl-H".


Of course, the advantage of having all the "mapping specifications" for all studies within the software is that it is easy to make comparisons ("diffs") between studies, something that is more difficult when working with Excel worksheets.

What was also shown was validation of results, but it looked to me as that was still using the P21 validation software, and not the modern CDISC CORE engine. For example, when just generating LB and SUPPLB, the validation result complains about missing files like DM and EX and … This is really not something I want to see when my mappings are still in development. When using CORE, one can just ask to validate the just generated datasets, and then even select which rules to apply and which ones to exclude. For example, when still developing the mappings, I would always exclude the "FDA Business Rules", as these only make sense for (near-) submission-ready packages. And I would include them when I am nearing the end getting everything ready.

Regarding SUPPLB (as any other SUPPxx datasets), it looks as it requires its very own mapping specifications. In our SDTM-ETL software, "Non-standard" variables (that need to be "banned" to SUPPxx) are treated as just normal variables within the domain, and are "split-off" at the very last end, which also guarantees consistency through --SEQ between parent domain and SUPPxx datasets. How this can be guaranteed when SUPPxx-s are handled separate right from the start (as I think is required by P21E), I don't know, it doesn't looks easy to me ...

What I did like however was the availability of a RESTful web service to communicate with the own SAS or R engine. But this is of course really needed when P21E does not have its own execution engine.

What I also liked is that the system seems to implement LOINC-CDISC mapping. It is not clear to me how this done, or whether it uses the original by CDISC published LOINC-CDISC mappings (2,400 mappings for 1,400 distinct LOINC codes). In SDTM-ETL a RESTful web service is used for that queries a database that has almost over 18,000 mappings for almost 10,000 LOINC codes). I presume P21E does not use these extended mappings, but I may be wrong. LOINC-CDISC mappings for 1,400 LOINC codes is a bit low for real-life usage.

What I also like is that there is at least minimal way of traceability checking in P21E, which is similar to what we have in SDTM-ETL. It e.g. however does not guarantee that all subjects have been taken into account, for which we do have some features for in SDTM-ETL.

So, does what is presented by Certara really speed up SDTM generation (as Certara claims)? I think, yes, maybe a bit. Essentially, so far as I can see, the presented software is just a mapping specifications management tool. It does not help in generating the mappings though, as SDTM-ETL does. There are no drag-and-drop features, no wizards, no build-in SDTM knowledge, no own execution engine, and no CORE validation.
So, I think it may be a small step forward, but at the same time has the great danger of vendor-lock-in.

Of course, I do not take responsibility for what ChatGPT emitted ...

I also asked ChatGPT about a comparision between P21E, and this is what it provided:


Your comments are of course, as always, very welcome!