GBOL 2 Fungi: Microbiome community barcode sequencing

Short Description of MOD-CO use case "GBOL 2 Fungi: Microbiome community barcode sequencing"

Data origin: The dataset derives from an environmental sample collection that was carried out as a part of the German Barcode of Life project, specifically of the subproject 12: “Design and Implementation of fungal-specific microarray chips for diagnostic purposes”. For the purpose of gathering leaf samples from Rosaceous trees, for the analysis of residing fungal communities as epiphytes and endophytes using Illumina MiSeq sequencing and hybridization on a custom-made microarray chip. The goal is to establish a microarray chip for the diagnosis of phytopathogenic fungi on orchard trees (Rosaceae, Vitaceae) and silvicultural trees (Betulaceae, Fagaceae, Pinaceae).

The sample collection was carried out on a community-managed tree plantation in Weidenberg, Bavaria. The collected leaf specimens were immediately submerged in liquid nitrogen after separation from the host and were transferred to sterilized barcoded plastic boxes and kept on dry ice to prevent RNA and DNA degradation. Each sample was assigned a specific barcode consisting of a UUID (version 4) and a connected QR-code for unambiguous identification. The workflow is in parts identical with that described under Sampling with GPS-enabled smartphone and DiversityImageInspector (DII). The sampling was documented using the smartphone app DiversityMobile on a Nokia Lumia phone. The created entries for collected specimens and their respective sample IDs, including background data like collection date and time, coordinates, space hierarchy codes, responsible people, as well as photo documentation were directly uploaded from the smartphone to the database DiversityCollection. The workflow in the lab was described in a DiversityDescriptions installation with link to DiversityCollection entries.

Details: The use case is executing the process-oriented schema MOD-CO (version 1.0, May 2018). It is based on 22 samples of leaves of 4 different tree species belonging to the family of Rosaceae. The workflow for processing the respective samples is split into 4 operational steps which are provided as individual records. For step 1, 121 descriptors of text type, 23 of numeric type, 0 of sequence type, 44 of categorical type, 13 of categorical and text type combined and 3 of categorical and numeric type combined have been used, which equals to 31.2% of all descriptors provided by MOD-CO; step 2: 118 descriptors of text type, 21 of numeric type, 0 of sequence type, 60 of categorical type, 16 of categorical and text type combined and 4 of categorical and numeric type combined have been used, which equals to 33.5% of all descriptors provided by MOD-CO; step 3: 69 descriptors of text type, 13 of numeric type, (2 of sequence type), 37 of categorical type, 11 of categorical and text type combined and 3 of categorical and numeric type combined have been used, which equals to 20.3% of all descriptors provided by MOD-CO; step 4: 127 descriptors of text type, 22 of numeric type, 0 of sequence type, 62 of categorical type, 29 of categorical and text type combined and 4 of categorical and numeric type combined have been used, which equals to 37.3% of all descriptors provided by MOD-CO.

The zip archive for the data in its version dated September 2018 (zip-archive) contains one MOD-CO SDD-structured xml file (for research data), one EML-structured xml file (for research project metadata) and two CSV files, one for the EML data table, one for the GFBio DublinCore (DC) pansimple structured metadata. The GFBio compliant DC metadata are generated via DiversityProjects export and are appropriate to be published via GFBio data pipelines.