open source software

Open source software has emerged as the driving force of technology innovation, from cloud and big data to social media and mobile. The Future of Open Source Survey is an annual assessment of open source industry trends that drives broad industry discussion around key issues for new and established software-related organizations and the open source community.

The results from the 2015 Future of Open Source Survey reflect the increasing adoption of open source and highlight the abundance of organizations participating in the open source community. Open source continues to speed innovation, disrupt industries, and improve productivity; however, a reported lack of formal company policies and processes around its consumption points to a need for OSS management and security practices to catch up with this growth in investment and use.

One of the strengths of open source is the ability to open up the code base and learn by reading and doing, that is, the transparency of the code base allows everyone to get involved. However, the barrier to entry can be the complexity of the code itself; without a qualified guide, you can get ‘lost in the code jungle’ pretty quickly.

Welcome to our code

With that in mind, we are starting today to author blog posts about the OpenClinica code base, including topics like how the code is organized, what the code does, and so on. A lot more detail on this can be found on the OpenClinica Developer Wiki, but these posts, viewed as a whole, can be seen as a gentle introduction, before interested parties start to dive deeper.

When we began to design OpenClinica, we had very few requirements, but the desire to create a fully-featured database for clinical data, aligned with open standards, making use of the best technology available. Call it the ‘tyranny of the blank page’, if you will. Every start-up faces it. Where do you start? What’s the plan? How do you build it, and what do you build first?

Luckily for us, we could use an open standard to base our schema, and our code, on top of; the CDISC ODM.

What’s a CDISC ODM?

The Operational Data Model, or ODM for short, is a standard published by the Clinical Data Interchange Standards Consortium (CDISC), and is “designed to facilitate the archive and interchange of the metadata and data for clinical research”, as it states in their website. This is a standard which is designed to a) hold metadata about a Study and all Events contained within a given Study, and b) hold Clinical Data which has been collected for a given Study. All of this information is held in XML, which is a very useful format for exchanging between sites, labs and institutions.

In the above image, you can see an XML file on one side using CDISC ODM and on the other side, an OpenClinica database. Inside the database are tables that map directly to different objects described in the XML. You’ll notice that the tables associated with study metadata also have a column called ‘oc_oid’, which are the Object Identifiers we use in all aspects of the OpenClinica application.

In the second image, you see that the latter half of the XML file (the part contained in the <ClinicalData> tags) also links to specific tables in the OpenClinica database. Since we link back to the Study metadata through those OIDs, we don’t use OIDs in those tables, but instead the conventional methods of primary keys and foreign keys in the database is good enough.

OK, so they map. But where’s the beef?

Of course, the ODM XML in the images is rather simple, and does not capture the full capability of the metadata that can be passed back and forth between different ODM data sources. For a longer example, you can take a look at the following XML, which defines the Rules governing a single Item:

As you start to piece together the XML in the above example, you’ll see that not only can you define the Question in multiple languages, but you can specify which measurement it is using and what kinds of values you can accept. The XML standard is extensible enough to add other pieces of information as well, including coded lists, data types, and so on. More information can be found at XML4Pharma’s page entitled, ‘Using CDISC-ODM in EDC.’

In future posts, we hope to describe more about the code base, and show how it all comes together as a full-featured application. If there are topics that are of specific interest, we hope you’ll comment below and let us know what you’d like to see here in the coming months.

Started in 2006, MirthConnect is an open source project sponsored by the Mirth Corporation of Irvine, CA. It is middleware designed to transform, route and deliver data. It supports HL7, X12, XML, DICOM, EDI, NCPDP and plain old delimited text. It can route via MLLP, TCP/IP, HTTP, files, databases, S/FTP, Email, JMS, Web Services, PDF/RTF Documents and custom Java/JavaScript. MirthConnect has been likened to a Swiss army knife and justifiably so.

Channels are the heart and soul of a MirthConnect installation. A channel is user defined and has a source and a destination. A source may be a flat file residing on a remote server or a web service call or a database query or even another channel—whatever you like, it doesn’t matter. A destination may be to write a PDF document, email somebody an attachment or enter data into a database. Again, whatever!

To illustrate, say you want to poll a database and generate a weekly report. No sweat! Using MirthConnect’s easy-to-use drag-and-drop template-based editor, define a channel with a database reader as a source, and a document writer as a destination, fill in details like user names, passwords and machine names, define which database fields you want to retrieve and how you want to display the data, and you’re done! MirthConnect’s daemon handles the rest based on your channel’s configuration.

Once defined, a channel can be exported as XML for later import into another MirthConnect installation. This is all done with the point and a click of a mouse.

At Geneuity, we use MirthConnect to get data in and out of OpenClinica. Originally, we used custom JAVA code to do this. But once we found MirthConnect, we quickly realized we were reinventing the wheel. Why do that?

Here’s a concrete example. Consider the very simple CRF from a mock OpenClinica installation shown in Figure 1. It has three groups of items: accessioning, results and reportage. When a specimen arrives at Geneuity, the lab tech looks up the patient and event pairing in the subject matrix as specified by the requisition and types into the CRF the accession number, the receipt date and any shipping deviations. This is done by hand and is indicated as step 1 of Figure 2.

Then, as shown in step 2 of Figure 2, the tech tests the specimen at the testing platform. In step 3, the platform spits out the data whereupon a collection of MirthConnect channels operating in tandem parses the results, transforms them into SOAP messages and sends them to the EventDataInsertEndpoint web service feature of OpenClinica for upload into the CRF fields designated ‘Assay date’ and ‘Analyte concentration’.

After the tech reviews the data and marks it complete, another collection of channels polls the database for results newly marked complete, generates and delivers PDF reports of the corresponding data (step 4) and then reports back to OpenClinica (step 5) via EventDataInsert the details of the reportage, including status, time and any errors (see the third and last item grouping labeled ‘REPORTAGE’ in Figure 1).

The scenario outlined above requires NO CUSTOM CODE beyond the channel configurations and these are encapsulated and standardized by design. As such, you don’t need an army of coders on staff to develop and maintain them.

Both OpenClinica and MirthConnect are great as standalone products. Linked together, however, they really sizzle.

Figure 1: A simple CRF from a mock OpenClinica installation

Figure 2: This shows how the different item groupings in the CRF depicted in Figure 1 are populated. Values for items under ACCESSIONING are entered manually by the lab tech. Values for items under RESULTS are populated by the Mirth channels continuously listening for in-coming data from the clinical testing platform. Values for items under REPORTAGE are populated by a distinct set of Mirth channels responsible for polling and reporting newly completed results.

I am a regular reader of “The Open Road” blog by Matt Assay on news.com. In one of his latest posts, “Getting open-source criticism wrong”, he does a great job of making the case that commercial open source software is about ease of adoption, flexibility, and choice.

It struck a chord because my sales team and I spend a great amount of time and effort explaining to prospective customers that we offer the same level of quality, stability, performance, service, and support as a proprietary vendor. In many cases we must meet a higher threshold than those vendors, because we do not have the lock-in of a commercial software license to compel customers to come back to us for repeat business. Our track record of successful long-term customer relationship is evidence we meet this threshold.

In certain sales situations, for the sake of simplicity and clarity, we have to focus only on these apples-to-apples characteristics, and do not have the opportunity to educate on the economic and technical advantages of OSS as much as we would like. It’s great to know that our open source clinical data management software technology and service offerings can stand successfully on these merits. However, as many readers of this blog already know, open source offers an additional set of critical benefits: “the ability to adopt software rapidly and at low cost, the flexibility to develop and extend their systems as they choose, and the ability to reduce risk by obtaining paid commercial-grade [or better] support”. As more decision makers are coming to understand, it is following this path, rather than the adoption of pricey, monolithic proprietary software, that leads to better outcomes and greater ROI.

Back in 2004, when I would tell people about our open source electronic data capture (EDC) technology and our open source business model, I got a lot of crazy looks and confused reactions.

Fortunately, these days there is a much greater understanding in our industry of what open source software is and for the significance of its ability to solve core problems that proprietary software cannot. However, we still have to try very hard to make sure our users understand what is different about open source and what they should expect. Over the course of my next few posts I’ll explore some aspects of how open source matters in clinical research informatics. Some of the ideas I’ll be exploring include (note this list may change, don’t hold me to it):