gene fusion variants mapped by shared PubMed IDs

Introduction

A gene fusion occurs when parts of two genes’ RNA combine to form one hybrid mRNA molecule before translation into protein. A common set of fusions found in lung cancer are the multiple combinations of genes EML4 and ALK [1], hereafter denoted EML4-ALK. The COSMIC database [2] lists 29 distinct fusions of EML4-ALK, which all vary by where the breakpoint between the EML4 RNA sequence and the ALK RNA sequence occurs, as shown in the following screenshot:

Each of the fusions in the above screenshot corresponds to one or more PubMed IDs, which indicate papers that provide evidence for the fusion. We can map the PubMed IDs in common between the fusions as a chord chart (below).

Results

Mapping COSMIC’s EML4-ALK gene fusions by shared PubMed IDs yields:

Here we see that fusion COSF474 shares many PubMed IDs with fusion COSF412, but few with fusion COSF493.

Method

The raw data fed to the chord chart looks like:

I used D3 (Data-Driven Documents) [3], a JavaScript library for producing data-heavy graphics, to create the image. Particularly, I modified the chord diagram shown at [4] to accommodate this data.