Massively-parallel
sequencing (MPS) technologies and their diverse applications in
genomics and epigenomics research have yielded enormous new insights
into the physiology and pathophysiology of the human genome. The biggest
hurdle remains the magnitude and diversity of the datasets generated,
compromising our ability to manage, organize, process and ultimately
analyse data. The Wiki-based Automated Sequence Processor (WASP),
developed at the Albert Einstein College of Medicine, uniquely manages
to tightly couple the sequencing platform, the sequencing assay, sample
metadata and the automated workflows deployed on a heterogeneous high
performance computing cluster infrastructure that yield sequenced,
quality-controlled and ‘mapped’ sequence data, all within the one
operating environment accessible by a web-based GUI interface. WASP at
Einstein processes 4-6 TB of data per week and since its production
cycle commenced it has processed 1 PB of data overall and has
revolutionized user interactivity with these new genomic technologies,
who remain blissfully unaware of the data storage, management and most
importantly processing services they request. The abstraction of such
computational complexity for the user in effect makes WASP an ideal
middleware solution, and an appropriate basis for the development of a
grid-enabled resource - the Einstein Genome Gateway - as part of the
Extreme Science and Engineering Discovery Environment (XSEDE) program.
In this paper we discuss the existing WASP system, its proposed
middleware role, its planned interaction with XSEDE to form the Einstein
Genome Gateway.