Case study: Advanced metadata

Domain-specific repositories, such as the Protein Data Bank (PDB), often require the submission of highly structured metadata along with data files. This is what enables users to perform specialized searches within these data repositories. For example, in PDB you can search for all the ligases from mice that were determined by X-ray crystallography at a resolution of 2.5 Angstroms or better (there were 12 last time I checked!). If everyone submitted data in whatever format they wanted, this kind of searching would not be possible.

The image below shows part of the metadata file for the crystal structure shown above. The complete file contains about 20,000 lines, many of which contain structure information generated during the experimental data capture. You can see that the metadata file includes specific categories that are filled in with specific data in defined formats.

If you are interested in submitting data to a domain-specific repository, data services staff would be happy to advise you about appropriate repositories for your data and the proper formatting of your submissions. Check out the information on our Consulting page for details or contact us or visit our page on advanced metadata.