Use Case DC5 - Agricultural Scientist

Jean has two current
projects. For the first, he studies the
interactions between soil microbes and plants.
It is funded by the USDA to investigate the communities of soil microbes
that are associated with heirloom corn crops.
His work consists of setting up experimental plots at an agricultural
field station and using 454 pyrosequencing to investigate soil microbe
diversity associated with different species of corn. His work produces extremely high amounts of
molecular data that requires high technical support. Right now his data are stored within his
institution and he obtained money for a web developer to build a visualization
interface into his data. Jean has no
problem sharing his data after publication, but does not want to be scooped in
publications or proposals. He uses MEINS
guidelines for metadata.

His second project uses the tomato as a model
system to study sympodial growth. This
project involves two types of data: phenotypic and genomic. The phenotypic data are collected via pencil
and paper after seeds are germinated.
Jean uses pedigree numbers to connect genotype, phenotype and generation
and all data are stored in a pedigree book.
He doesn’t normally share this book because it wouldn’t make sense to
others, but freely distributes seeds to colleagues that ask for them. He considers these seeds to be data. When he receives seeds from others he “vets”
the data by germinating the seeds and confirming the phenotype. Jean is wary of going completely digital with
his phenotype data because of stories he’s heard from other colleagues who have
lost lots of work. However, he does
transcribe data from paper to an excel sheet.
He keeps the paper copy and sometimes refers back to it to jog his
memory. The genomic data comes off of a
sequencing machine and is assembled by a computer. Jean has two types of sequence data: whole
genome and transcriptome. The assembled
genomes he is willing to share immediately and thinks others should do the
same. The transcriptome data are used to
answer a biological question and thus are more sensitive. He would be willing to share the raw
transcriptome data after publication.
Repositories exist for genome data, but not for raw phenotypic or raw
sequence reads. Jean uses standard gene
nomenclature to describe mutants, but feels unqualified to handle metadata. There are no metadata standards for his
discipline, probably because researchers are still trying to figure out how to
handle and analyze the genomic data. He
knows the plant ontology exists, but doesn’t use it because it does not serve
his needs – too general.

Queries

Find all available data
about gene expression in Arabidopsis and serve it up in a usable format

Operations/Tasks

reanalyze soil microbe
metagenome data that have been collected with other crop species