Bottom Line:
Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated.Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights.A total of 808 metadata for analyzed data obtained from 35 biological species are published currently.

ABSTRACTMetabolomics - technology for comprehensive detection of small molecules in an organism - lags behind the other "omics" in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called "Togo Metabolome Data" (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers' understanding and use of data but also submitters' motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.

Figure 2: Hierarchy of the metadata classes of the TogoMD format. The metadata of metabolome analysis is divided into various classes; specifically, a class for the purpose of study information with a set of samples (SE), samples (S), analytical methods (M), and data analysis (D). These classes constitute a tree structure. We define classes for commonly used (shared) procedures for sample preparation details (SS), analytical method details (MS), data analysis details (DS), and annotation method details (AM) under the top-level class. Their instances are referred from the instances of the other classes (dashed arrows). The data class for peak information (P) is not used in Metabolonote. The parentheses are the prefixes of their instance ID.

Mentions:
To manage the metadata in Metabolonote, we defined a novel metadata and data format called the TogoMD format. This novel format was conceptualized with the following considerations. The metabolomic metadata can be separated into hierarchical classes, specifically, a class for information about the purpose of the study, a class for sample preparation information, a class for analytical method information, and a class for data analysis information (Figure 2). In general, multiple samples are used per study; for example, biological replications of treatment and control groups. A sample can be analyzed in several ways and multiple times; for example, by liquid chromatography–MS, gas chromatography–MS, NMR, and their analytical replications. The raw data generated by each analysis can be analyzed in several ways. For instance, computational tools and procedures should be different when the data are used as fingerprints, for metabolite annotations to make metabolic profiling data, or used for getting tandem mass spectrum data. Each set of data generated through the process should be related to each class at a different level of the hierarchy. For instance, the raw data generated by the analytical apparatus should be related to the analytical method class, and the processed data should be related to the data analysis class. To share the metadata among outside systems, a method for uniquely accessing the metadata at each level of the hierarchy, possibly by unique identifiers (IDs), is required. We first evaluated the utility of ISA-Tab, which is used in MetaboLights (Sansone et al., 2008; Rocca-Serra et al., 2010). In ISA-Tab, the hierarchy of the classes mentioned above is connected with sample names and names assigned to the protocols (a set of common procedures). Therefore, control of the nomenclature of these names is required to establish a method to uniquely and computationally access the metadata at each level of the hierarchy, and modifications of the predefined format such as addition of an extra column for IDs is possibly required to control it well. Therefore, we defined TogoMD, which includes a rule for ID assignment (details are given in Systematic ID Design for the Metadata). Similar to the ISA-Tab protocol, frequently utilized procedures can be written as referent information. As each metadata has description and comment fields for free description by the submitter, almost the same metadata as those in ISA-Tab can be described in accordance with the recommendation of MSI (Fiehn et al., 2007) and also should be conducted in line with that of COSMOS (Steinbeck et al., 2012) Working Package 2 in the future. The relationships between the description fields of ISA-Tab and TogoMD used in Metabolonote are shown in Table S1 in Supplementary Material. Most of the fields in TogoMD, including author description, are simpler than those of ISA-Tab. TogoMD also defines formats for the data files. The details of the formats are described in the online help at the Metabolonote website6.

Figure 2: Hierarchy of the metadata classes of the TogoMD format. The metadata of metabolome analysis is divided into various classes; specifically, a class for the purpose of study information with a set of samples (SE), samples (S), analytical methods (M), and data analysis (D). These classes constitute a tree structure. We define classes for commonly used (shared) procedures for sample preparation details (SS), analytical method details (MS), data analysis details (DS), and annotation method details (AM) under the top-level class. Their instances are referred from the instances of the other classes (dashed arrows). The data class for peak information (P) is not used in Metabolonote. The parentheses are the prefixes of their instance ID.

Mentions:
To manage the metadata in Metabolonote, we defined a novel metadata and data format called the TogoMD format. This novel format was conceptualized with the following considerations. The metabolomic metadata can be separated into hierarchical classes, specifically, a class for information about the purpose of the study, a class for sample preparation information, a class for analytical method information, and a class for data analysis information (Figure 2). In general, multiple samples are used per study; for example, biological replications of treatment and control groups. A sample can be analyzed in several ways and multiple times; for example, by liquid chromatography–MS, gas chromatography–MS, NMR, and their analytical replications. The raw data generated by each analysis can be analyzed in several ways. For instance, computational tools and procedures should be different when the data are used as fingerprints, for metabolite annotations to make metabolic profiling data, or used for getting tandem mass spectrum data. Each set of data generated through the process should be related to each class at a different level of the hierarchy. For instance, the raw data generated by the analytical apparatus should be related to the analytical method class, and the processed data should be related to the data analysis class. To share the metadata among outside systems, a method for uniquely accessing the metadata at each level of the hierarchy, possibly by unique identifiers (IDs), is required. We first evaluated the utility of ISA-Tab, which is used in MetaboLights (Sansone et al., 2008; Rocca-Serra et al., 2010). In ISA-Tab, the hierarchy of the classes mentioned above is connected with sample names and names assigned to the protocols (a set of common procedures). Therefore, control of the nomenclature of these names is required to establish a method to uniquely and computationally access the metadata at each level of the hierarchy, and modifications of the predefined format such as addition of an extra column for IDs is possibly required to control it well. Therefore, we defined TogoMD, which includes a rule for ID assignment (details are given in Systematic ID Design for the Metadata). Similar to the ISA-Tab protocol, frequently utilized procedures can be written as referent information. As each metadata has description and comment fields for free description by the submitter, almost the same metadata as those in ISA-Tab can be described in accordance with the recommendation of MSI (Fiehn et al., 2007) and also should be conducted in line with that of COSMOS (Steinbeck et al., 2012) Working Package 2 in the future. The relationships between the description fields of ISA-Tab and TogoMD used in Metabolonote are shown in Table S1 in Supplementary Material. Most of the fields in TogoMD, including author description, are simpler than those of ISA-Tab. TogoMD also defines formats for the data files. The details of the formats are described in the online help at the Metabolonote website6.

Bottom Line:
Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated.Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights.A total of 808 metadata for analyzed data obtained from 35 biological species are published currently.

ABSTRACTMetabolomics - technology for comprehensive detection of small molecules in an organism - lags behind the other "omics" in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called "Togo Metabolome Data" (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers' understanding and use of data but also submitters' motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.