The remit of the Sample Processing Working Group is to produce reporting guidelines, data exchange formats and controlled vocabulary covering all separation techniques not considered to be 'classical' one- or two-dimensional gel electrophoresis (cf. the Gel WG home page), along with other kinds of sample handling and processing (for example, 'tagging' proteins or peptides, splitting, combining and storing samples). Where possible we seek to develop our products in collaboration with all proteomics stakeholders and, where relevant, developers from other standards communities, most notably metabolomics.

Minimum reporting requirementsThe evolving Minimum Information About a Proteomics Experiment (MIAPE) documents offer guidelines on how to adequately report a proteomics experiment. It is expected that these documents will be published, and that the requirements within will be enforced by journals, compliant repositories and funders (cf. MIAME).

XML formats for data exchangeDerived from the FuGE general object model, the formats developed by this workgroup are designed to function both as standalone files and as part of a 'parent' FuGE-ML document. These formats will facilitate data exchange between researchers, and submission to repositories or journals.

Controlled vocabularies (CVs) and ontologyLists of clearly defined terms are crucial for the construction of unambiguously worded data files. In addition to providing supporting CVs for the individual data capture formats as part of the integrated PSI CV, the Sample Processing WG will contribute terms to the Functional Genomics Ontology (FuGO).

The MIAPE minimum reporting requirements

The context-sensitive nature of transcriptome, metabolome and proteome data necessitates the capture of a richer set of metadata (data about the data) than is required for basic genetic sequence, where usually knowing the organism of origin will suffice. The use of paper citations as proxies for actual metadata hinders the reassessment of data sets, and obstructs non-standard searching (e.g. by the order that different liquid chromatography columns were coupled). The requirements of the various journals also differ, so important detail may be lacking in some cases, or presented in an esoteric fashion.

There is then a need for public repositories that contain information from whole proteomics experiments; making explicit both where samples came from, and how analyses of them were performed. It is therefore appropriate to attempt to define the minimum set of information about a proteomics experiment that would be required by such a repository.

Data and metadata produced in different places may be in different formats, making comparison and exchange difficult. We therefore seek to develop consensus interchange formats (XML Schema based) to facilitate the development of effective search and analysis tools, simplifying both the dissemination and exchange of data. These XML Schemata are derived from application-specific object models (themselves subclassed from the generic FuGE object model). The development process will draw on the wisdom and experience of a wide range of concerned people, both in academia and industry.