This Blog is managed by Pratin Ashtekar and Amy Bennett. The postings on this site solely reflect the personal views of each author and do not necessarily represent the views,positions,strategies or opinions of IBM or IBM management. IBM reserves the right to remove content deemed inappropriate.

Clinically defined diseases show variable response to treatment. As the next generation DNA sequencing technologies are becoming cheaper, there is growing emphasis in understanding these disease response variations at the genetic level. There is also a quest to find drugs or treatments tailored for genetically different patient groups, and major pharmaceutical companies are investing heavily into large-scale disease biomarker discovery and validation programs. This scenario is also encouraging the generation of large data sets and the associated need for a faster analytical pipeline to make sense of that big data.

Rheumatoid Arthritis (RA) is a chronic inflammatory disorder that typically affects the small joints in hands and feet. It affects the lining of joints, causing a painful swelling that can eventually result in bone erosion and joint deformity. As the available alternative therapies to treat RA are growing in number, there is an increasing need for predictors of which patients will respond to which therapy. Anti-TNF drugs in RA represent a prototypical example of this opportunity. Over the past decade a number of studies have tried to develop robust predictors of response with mixed results. Although there are treatments for Rheumatoid Arthritis, only 30 percent of Rheumatoid Arthritis patients respond to anti-TNF therapy. IBM Research and the Arthritis foundation are harnessing mainframe computing power to collect data and develop predictive models that will help doctors know which patients are most likely to respond to anti-TNF therapy.

To further advance the development of models to predict response to anti-TNF treatment in RA patients, Sage Bionetworks and the IBM-sponsored DREAM project, in collaboration with the Arthritis Foundation, developed the Rheumatoid Arthritis Responder DREAM challenge. The goal of this challenge is to use crowd-based competition framework to develop a validated molecular predictors of response to anti-TNF treatment in Rheumatoid Arthritis (RA). The challenge used new unpublished large-scale data relating to RA treatment response. The challenge was hosted by DREAM and Sage Bionetworks and opened in February 2014.

The challenge consisted of two tasks. In one sub-challenge, participants were tasked to quantitatively predict treatment response to anti-TNF therapy, while in the other sub-challenge participants had to classified the treated patients as responders and non-responders. In essence, the models should not only predict the positive outcome of the treatment but also explain the negative outcomes. The training data set was compiled from a set of 2,706 individuals of European ancestry through an international collaboration among 13 collection groups from Genome-wide association studies (GWAS) and clinical studies.

As part of this challenge IBM provided participants free access to its System Z servers (with 20 processors, 242 GB memory and 9 TB storage and a second server with 128 GB and 1 TB storage) to allow them to take advantage of parallel computing with multicores for large data sets (30 GB). The user details of the challenge participants registered and approved for data access were downloaded each working day from Synapse, a collaborative IT platform designed and run by Sage Bionetworks and optimized to run these type of challenges. Users were provided with a userid and system details to access the IBM server. The IBM System Z server was demo-ed in a Webinar where details of how to access and use the system along with tips for efficient use were presented. Users who opted to use the IBM servers had access to the commonly used computational biology software open-source package tools R, Plink and Octave. A number of special packages were also installed for R upon some users’ requests. Due to the large size of the data and the participants, a common read-only copy of the data was provided to the participants. The users’ questions related to the use of R packages and other software related questions were answered on a one-on-one basis and their feedback was sought to judge their satisfaction.

As the challenge progressed, participants submitted predictions for the challenge questions and those results were evaluated on a rolling basis. The evaluations of those predictions were listed in leader boards in the Synapse site. Two interim winners were selected from the team phase. One of the participants that used the IBM System Z server came in the top-5 positions for one of the two sub-challenges. The deadline for submitting final results in the team (competitive) phase has ended by the first week of June. These submissions are being scored against an unpublished dataset (the Consortium of Rheumatology Researchers of North America, or CORRONA, dataset), which has not been used prior to this challenge. Preliminary results show that there is strong predictability of response to treatment resulting from the submissions. Final results of the competitive phase are expected by the end of June. After ranking of submissions in the competitive phase, the challenge will enter in its community phase, in which the best performer teams of the competitive phase will be joining forces and improve on the best results of the team phase. The journal Nature Genetics will be observing participation in the community phase and will be working with the teams towards publishing the challenge outcomes.