Next-generation sequencing platforms (Illumina/Solexa, ABI/SOLiD, etc.) call for powerful and very optimized tools to index/analyze huge genomes. The GEM library strives to be a true "next-generation" tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task. At the moment, efficient indexing and searching algorithms based on the Burrows-Wheeler transform (BWT) have been implemented. The library core is written in C for maximum speed, with concise interfaces to higher-level programming languages like OCaml and Python. Many high-performance standalone programs (mapper, splice mapper, etc.) are provided along with the library; in general, new algorithms and tools can be easily implemented on the top of it.

The GEM project started at the CRG in June 2008. Since fall 2008, early versions of the GEM tools have been in everyday use
to help the development of many different scientific projects involving mapping of DNA/RNA data, reconstruction of RNA splice-form abundances, SNP calling, microRNA analysis, ChIP-seq experiments, metagenomics studies, and other tasks related to next-generation sequencing. Since April 2010, the GEM project is being developed at the CNAG by the Algorithm Development unit.

All these components are modular, so you can install them either all together as a bundle or a few at the time.

Temporary note (updated 27/10/2012)

Please note that not all the components are available at the moment. In particular:

the subpackages which cannot be distributed as binaries will be uploaded shortly, when the currently ongoing code review is completed and the GEM source code becomes ready for distribution

the port to Mac is currently broken. We are working to resurrect it as soon as possible

the split mapper and the mappability tools are still in the process of being migrated to the new mapping engine. The mappability will come back online shortly.

Other GEM-friendly tools

If you are happy with GEM, you might also like some friend projects:

MIRO, a pipeline to analyze microRNAs using next-generation sequencing data

The Flux Capacitor, a set of tools to predict the abundance of splice-forms from next-generation sequencing data.

These projects provide full integration for the gem-mapper to be used as the engine of their mapping stage.

Getting started

Installing

At the moment, only a pre-compiled binary pre-release is available.

Please read the installation instructions carefully (making sure that your architecture is supported and that you select the most optimized bundle available for it), and then proceed to the download page.

Documentation

The information needed to use the GEM tools can be obtained from both the technical documents (user's guide, man pages, etc.) and the scientific ones (pre-prints, published articles, etc.).

Terms of use

Most GEM programs are distributed under a double-licensing scheme: they are free for non-commercial use, but a license is required for commercial applications.
In practice, you are very welcome to freely use or redistribute GEM for any purpose, except for the following limitation: you have to ask us for a commercial license if you want to

build a commercial software (data analysis framework, pipeline, etc.) on the top of GEM (using GEM either in binary or source form)

All other cases (using GEM either as a standalone software or embedded into non-commercial applications, and redistributing GEM for free either as a standalone software or embedded into non-commercial applications) do not require a special license from us. In particular:

GEM is free for academic non-commercial use

you can always use the results you obtained with GEM for any purpose, even a commercial one.

At the moment we are not yet ready to distribute the sources of GEM (a major code cleanup is ongoing), but once we are we will do so under a similar double licensing scheme (a GPL-like license for non-commercial use, and a personalized licensing scheme otherwise).

Authors

Several people have contributed code to GEM along the years. They are, with their respective funding institutions: