Google will award a certain number of student slots to the
Bioconductor project.

The Bioconductor administrators and mentors will rank projects in order of
importance to the project, and the top projects will be funded.

Any selected students will be expected to register with the
bioconductor and bioc-devel mailing lists.

There is a
timeline
posted at Google explaining how this works. Students are encouraged
to look at this and make sure that they can commit to this. There is
also a
FAQ
in case people have other questions that are not addressed here.

Here are our suggested ideas:

ExperimentHub project

Background/Motivation: As very large genomic data sets become more and
more common, computational biologists are spending inordinate time
transforming data from the format of the original resource to a format
amenable to computation in their programming language of choice. The R
/ Bioconductor community needs programmatic access to cloud-based
experimental data resources that can be readily incorporated into
their own work flows.

Goal

AnnotationHub
and its supporting packages are primed to support such a project.
AnnotationHub provides infrastructure to make well-curated resources
available to R software clients, but it needs the addition of a web
interface to allow addition of user-supplied resources, including
transformation of data into formats amenable to direct use by R clients.

Specific Aim

Work with us to create a web accessible interface
that does the following:

allows the user to add large genomic
resources and associated metadata to a NOSQL back-end database.

allow users to upload literate programming documents illustrating the use of
their data, and the documents' code would be evaluated by the web application,
reporting any errors, and 4) implement social networking concepts like tags
and rating systems to help guide users to the most useful resources.

Skills required

Familiarity with R and with AJAX Web 2.0 programming.

Test

Using the R packages
Rook
and
rmongodb,
write a web form that asks the user
for their name and age and stores this information in a MongoDB collection. A
second page asks the user for an age and then displays all records for people
that age or older. This should be sent to us as an R package so that all we
have to do (once we've installed
MongoDb) is install the package and run:

library(testPkg)
run()

..and we'll see the desired functionality.

Mentors

Backup Mentor

Shiny Bioconductor Objects Project

Background/Motivation

The Shiny package allows for easy
creation of interactive web graphics from R objects. Bioconductor
packages have many objects that represent biological data or results.
For each of these Bioconductor objects, there exists a typical set of
visualizations to help users explore their data. Normally, these
visuals are replotted several times until certain parameters are
tweaked to show the image in a way that conveys a specific insight.
This project pairs these standard Bioconductor objects with more
user-friendly Shiny visualizations via new display() methods.

Tasks

Subscribe to the mailing lists for bioconductor and bioc-devel
mailing lists

Using the mailing lists (which are searchable
here and also the
documentation from within the following packages familiarize yourself with the
following Bionductor objects:
Biobase::ExpressionSet,
GenomicRanges::SummarizedExperiment,
GenomicRanges::GRanges,
GenomicRanges::GrangesList.

How to use ggbio
to draw a generic display from the GRanges or GRangesList
objects.

Create display() methods for each of the four object types mentioned above.
These methods should use Shiny to display a multi-tabbed version of each of the
objects in question where each tab offers an alternate view of the object in
question. Each method should allow the caller to view a table of annotations in
one tab and to also visualize the data using either a heat map or a Gviz plot (
as appropriate) in another tab.

Make sure that your R code to draws these 4 very similar methods is written in
such a way so that you don't repeat yourself all the time.

Integrate these tests into the appropriate bioconductor packages along with
proper documentation. Which package is appropriate will be determined later as
the project develops.

Alternate Tasks

Most of the things above describe how you could approach making a
display() method that depicts a pretty standard bioconductor object.
These four objects are mentioned because we consider them to be the
most important ones to do 1st. But consider implementing a useful
display method for another object or maybe more. What should that
look like and why would you choose it? Choose an object of personal
interest to you and also make a display() method for it as well.

Skills required

Familiarity with R. Understanding of basic computational biology or an
willingness/ability to learn about such things as needed.

Test

Find the example from the manual page for Biobase::ExpressionSet, and run it.
And then plot the relevant data contained in the generated ExpressionSet object
as a heatmap and save the output. Then send me the plot. BONUS: make sure
that all the labels in your plot are fully visible.

Mentors

Backup Mentor

BiocParallel / BatchJobs integration

Background / Motivation

High-throughput sequencing generates data sets consisting of hundreds
of millions of sequence reads per sample. As with any large data,
timely processing depends on parallel computing. The Bioconductor
project has developed the
BiocParallel package,
an abstraction
around several parallel implementations in R. The API is tailored to
typical use cases in biological data analysis and integrates with
existing Bioconductor data structures. Another package,
BatchJobs,
executes R functions as scheduled cluster jobs, through an
abstraction that has been implemented for several popular
schedulers, including LSF, PBS and SGE.

As sequencing data pipelines are typically executed on managed
clusters, there is a need for BiocParallel to interact with cluster
schedulers. We aim to add a new backend to BiocParallel that
delegates to BatchJobs for this interaction.