Contents

Scope of this article

Figure 1: Venn diagram distribution of mined resources across four sources (Frontiers in red, Neuroinformatics in green, NIF in purple, Scholarpedia in orange). The numbers outside the ovals indicate absolute resources counts in each source. The colored numbers inside the ovals denote the numbers of unique resources within each repository. The black numbers show 2—, 3—, and 4—way overlaps among sources.

We surveyed the advances in the last five years to assess how the field matured to its current state-of the art and to evaluate specific opportunities and growing trends. Careful characterization of a representative sample of 337 neuroinformatics resources in terms of several dimensions enabled the distillation of summary analytics. Metadata annotation included among others animal species, resource type (software, atlas, database, etc.), scale (from molecules to whole brain), application (e.g. visualization, analysis, simulation), and measurement type (anatomical, functional, biochemical, and more). One of the most active areas of research and software development revealed by this overview concerns the visualization and analysis of human neuroanatomical data at the whole brain scale. The complete curated list of neuroinformatics resources available online that accompanies this article ( http://hdl.handle.net/1920/9150 ) constitutes a valuable tool in its own merit for free browsing, search, and exploration. The curated data sample is also made available to download in .csv format.

Resource identification

We identified a representative sample of 337 tools by querying four independent neuroinformatics sources (Figure 1): (1) The Neuroscience Information Framework (NIF) and its community-curated lexicon NeuroLex, (2) The Springer Neuroinformatics journal (3) The Frontiers in Neuroscience journals, and (4) Scholarpedia’s expert curated collection of encyclopedia entries. The approach was separately customized by trial-and-error approximation for the different sources. Traditional journal publications were interrogated by full-text searches for the combined occurrence of the keywords "software," "tools," "resources," and "http." This query produced 404 and 401 hits from Neuroinformatics and Frontiers, respectively, over the 5-years inclusive span 2010-2014. For Scholarpedia, only a subset of the keywords, "software" and "tools," returned 36 articles. After adding the entire list of 262 resources federated with NIF to this initial pool, we removed out-of-scope entries (news items, blog posts, podcasts, people, grants, and non-neuroscience resources) as well as broken or clearly obsolete pointers. Moreover, when both parent and child links were present, we only included the most relevant ones. Lastly, we filtered off overlaps, resulting in a final resource count of 337 (Figure 1).

Figure 2: Metadata used for curating the sample of identified neuroinformatics resources. The multidimensional characteristics of the resources are categorized using eight 'features' (top layer in yellow), 38 'elements' (middle layer in blue), and hundreds of 'examples' (bottom layer in green).

Resource categorization

We annotated each of the 337 resources according to its characteristics across certain classification dimensions or 'features' (Figure 2): Scale, Species, Resource type, Measurement, Application, Area of study, User support, and Resource availability (the first six of these dimensions are included in Table 1 below). For every feature we chose a list of 'elements' suitable to describe the resource, clarifying their intended meaning with a series of self-explanatory keyword examples (Figure 2). For instance, the elements (and related representative examples) of Scale are: whole nervous system (non-invasive brain imaging with magnetic resonance), regional (cortical areas, specific fiber pathways), cellular (axonal trees, patch clamp), and macromolecular (gene sequences, protein expression). We informally defined the eight Features and 38 Elements with simple textual descriptors , which are hyperlinked to the corresponding terms in Table 1 and in the complete online table. This organizational layout, however, is emphatically not meant as a formal schema (let alone an ontology), but rather as a simple and practical attempt to group resources in a manner amenable to human comprehension. As such, we acknowledge that our choices are arbitrary, and several alternatives would be equally reasonable. Moreover, features are not orthogonal to or independent of each other. When selecting more than one element as adequate descriptors of a resource, we also always indicated one as primary (marked with * in Table 1 and in the complete online table) to facilitate subsequent analytics.

Resources examples

We selected a small subset (<10%) of the identified resources to illustrate the collated information with a few examples (Table 1). Each of the 31 resources listed below and all entries in the comprehensive collection at http://hdl.handle.net/1920/9150 are hyperlinked to the respective home page. In addition to the six features reported in Table 1, the complete online table includes for each of the 337 resources a summary description of every entry, literature references or unique resource identifier (PMID, PMCID, NIF ID), public availability (freeware, open source, etc.), support provided to users (e.g. manual, mailing lists, frequently asked questions), funding agencies, and institutional affiliations. The Pubmed or Pubmed Central identifiers refer to the recent article(s) citing the resource that was returned by the literature search described above under "Resource Identification," and not necessarily to the original publications of the resource. The user support feature also includes a 'citation' element indicating that the tool has been used by at least one external party in peer-reviewed publications beyond its intended in-house purposes tags the resource (a sign of resource utility and maturity).

Table 1. Subset of resources depicting the main information elements available in the accompanying master list

A major neuroinformatics initiative pertaining to whole-brain non-invasive human imaging (Table 1) is the Human Connectome Project. High-resolution scanning of over 1200 healthy adults using cutting-edge methods such as diffusion and resting-state magnetic resonance aimed at comprehensive mapping of neural circuitry, its relationship to behavior, and the contributions of genetic and environmental factors to individual differences. This "big data" project, including the development and free distribution of analysis and visualization tools, was supported by funding from the National Institute of Health Neuroscience Blueprint to a selected consortium of major universities. A complementary "grass root" effort of similar scope, included in the complete online table, is the 1000 Functional Connectomes Project, entailing the aggregation and full unrestricted public release (via www.nitrc.org) of over 1200 resting state fMRI datasets collected from 33 sites.

Exploiting the progressive acceleration in computing power, spiking neural network simulations are continuously improving in terms of both scale and biological realism. A notable tool in this regard is CARLSim (Table 1), an efficient open source simulator written in C/C++ (Beyeler et al., 2014) that allows execution on both generic central processing units (CPUs) and standard off-the-shelf graphical processing units (GPUs). In particular, CARLSim implements the formalism of Izhikevich models, allowing generation of faithfully complex neuronal dynamics with a simple system of equations suitable for fast numerical integration (Izhikevich, 2006). The complete online table includes several other resources relevant for network simulations, including NEST (Gewaltig & Diesmann, 2007), Brian (Goodman, 2010), and the Open Source Brain for sharing and collaboratively developing computational models that encourages the use of open standards to ensure transparency, modularity, accessibility and cross simulator portability.

One of the most prominent neuroinformatics resources at the molecular and systems level is the Allen Brain Atlas (Lein et al., 2007), an annotated brain-wide, genome-wide gene expression map of the adult mouse obtained from in situ hybridization (Table 1). Each expression pattern is registered with a high-resolution 3D anatomical delineation of more than 600 nervous system areas from Nissl stain. The same portal also provides free access to similar data for the developing mouse and the human brains (including microarray experiments), together with regional connectivity maps from tract-tracing and a powerful informatics platform for query and analysis. A related resource in the complete online table is Gensat (Heintz, 2004), a publicly available gene expression atlas of the developing and adult central nervous system in the mouse, using both in situ hybridization and transgenic techniques.

At the level of dendritic and axonal arbors, the widely used repository of digital reconstructions of neuronal morphology NeuroMorpho.Org (Table 1) is regarded as a success story in neuroscience data sharing (Parekh & Ascoli, 2014). This archive contains more than 30,000 neurons from two dozen species and hundreds of brain regions and cell types, contributed by over one hundred laboratories worldwide using a broad range of histological, visualization, imaging, and tracing techniques. A complementary resource included in the complete online table is ModelDB, a database for storing and efficiently retrieving computational neuroscience models, mostly in the popular NEURON simulation environment (Carnevale, 2007) using complex morphologies of single neurons such as those available at NeuroMorpho.Org.

Emerging trends

Figure 3: Distribution of resources across all elements within each of the main features. Both primary elements and total counts are reported (stacked bars), as well as weighted counts (pie charts).

In order to gauge the prominence of distinct elements within each feature, we quantified the distribution of available resources across the main annotation dimensions. For every feature of Table 1 (scale, species, resource type, measurement, application, and area of study), we counted all resources in the complete online table that listed a given element as well as the subset of those that specifically indicated the element as primary. When All elements of a feature were marked, we added the resource in the "general" count for each of the elements but for none of the elements in the "primary" count. Moreover, we computed a weighted count by dividing each element so as to sum to one per resource. For example, if a resource listed primates, rodents, and insects as species, each of those elements would be counted one-third. This normalization allows a fair assessment of element proportion in every feature across resources (Figure 3).

With respect to scale, whole brain is the most represented element both in the general and primary counts (Figure 3a, bar charts) as well as the weighted proportion (Figure 3a, pie charts), followed by regions, cellular, and molecular, in straight order from macroscopic to microscopic.

Primates (largely humans but also including monkeys) are clearly the dominant species, reflecting the prominence of whole brain scale (Figure 3b). More than twice as many resources have primates as the primary species (154) than all other elements in this feature together (66), including rodents (mostly rats and mice), other mammals, insects (primarily flies), and others species (typically worm, fish, and birds). For humans the numbers are driven by non-invasive imaging in the areas of cognition and behavioral neuroscience, neuroanatomy, and clinical and developmental neuroscience. Resources dedicated to the phenotype-genotype relationship are particularly abundant for mice (e.g., Monarch Initiative and Mouse phenotype database) and fruit fly (flybase.org, flycircuit.tw, Bloomington Drosophila stock center). The representation of other species is fairly sparse and encompasses focus on comparative neuroanatomical (ABCD, Braininfo), antibody databases (NeuroMab), and live imaging (ZFIN).

By far the most numerous resource types fall into the category of software tools (Figure 3c), as exemplified by the popular open source program ImageJ. Next in representation are ontology and data management services (for example BioPortal) and databases (Alzheimer’s Disease Neuroimaging Initiative or ADNI). Ontologies and data management resources focus on data formats, standardization, terminologies, and machine readability, and thus often overlap with databases. In fact, several resources were annotated with all of these three most common type elements, as exemplified by Chemical Entities of Biological Interest (ChEBI), flybase.org, and Mouse Genome informatics. The remaining resource types are much more sparsely represented. Topical portals are typically the least structured types, consisting of collaborative initiatives, open challenges, and other thematic resources. Instrumentation emphasizes hardware machinery for data acquisition, analysis, or lab operations. Lastly, atlases consist of standardized reference templates (e.g. Talairach coordinate space) with registered 2D/3D spatial information, such as gene expression, tract-tracing, or magnetic resonance imaging.

In terms of measurement, the most abundant resources pertain to the anatomical dimension (Figure 3d), followed by functional, physiological, and biochemical. The first two elements are again largely linked to human non-invasive whole brain imaging. Anatomical resources mostly refer to structural and diffusion magnetic resonance, while functional, as the name suggests, to fMRI. Resources categorized as anatomical, however, also include a variety of other tools focusing on shape, size, location, and connectivity information (e.g., Automated reconstruction of complex curvilinear structures, Knife-Edge Scanning Microscopy Brain Atlas, SumDB, and NeuGen) across all scales from microscopic to macroscopic. Similarly, 'functional' resources (e.g., DICOM, resting-state fMRI, Protégé ontology editor) also include tools with an emphasis on mapping neural dynamics to brain states, including computational approaches (neural networks, information theoretical measures, non-linear time series analysis) as well as experimental (behavior measures of attention, speech recognition, response monitoring, and more). The last two measurement elements together account for less than a quarter of resources: physiological, quantifying spatiotemporal data with electrical or optical recordings (or related model simulations), and biochemical, most commonly high-throughput analyses, such as gene expression profiling using microarray technologies (e.g., BrainSpan, ClinVar).

Relative to the application domain, visualization and data analysis are almost omnipresent in all scientific studies, claiming the lion’s share of resources (Figure 3e). Visualization mainly refers to rendering of experimental data from e.g. microscopy, neuroimaging, physiology, and virtual reality. Data analysis plays an increasingly important role in large-scale automation analyses. These two applications are closely associated together in resources for result quantification with statistical plots or graphing of parameter relationships (e.g., IGOR Pro, PRoNTo, MIPAV). Tools for annotation, such as electronic lab notebooks for metadata entry, are crucial to frame the data in proper context for reuse (e.g., CogPO Wiki, clinicaltrials.gov, Multimodal MRI reproducibility resource). Computational modeling resources include tools for simulating morphological growth, membrane biophysics, electrophysiological activity, and network dynamics (e.g., Scilab, Cellular Dynamic Simulator, neuRosim). Lastly, neurotechnology include brain-machine interfaces, neuromorphic engineering, and other similar approaches and devices (e.g., Easycap, The Kilobot Project, BCI2000). The practical utility of available resources is ultimately determined by the user's creativity. In the domain of neuronal morphology, for example, diverse applications of L-Measure, Trees toolbox, and Neuromantic to visualization, analysis and modeling have yielded many advances in neuronal characterization, as evidenced by their many numerous citations.

Concluding remarks

This article provides a broad snapshot of recent neuroinformatics progress based on a curated sample of resources identified in the relevant scientific literature of the last five years. The comprehensiveness of this collection is necessarily constrained by our search and selection approach, and thus is not meant to reflect the actual number of all available digital resources in neuroscience. Instead, the purpose of this overview is to describe a representative variety of neuroinformatics resources across multiple features, in order to appreciate the broad diversity of technologies and methodologies used in the field.

One of the main observations that results from our analysis regards the prominence of human non-invasive whole brain imaging. Major neuroinformatics developments, however, are also transforming other domains of neuroscience, as exemplified by the comprehensive genomic profiles of the Allen Brain Atlas (Ng et al., 2012; Shen et al., 2012). With appropriate resources, even individual laboratories can successfully embark in "big data" initiatives, such as the tracing of a substantial proportion of the fly nervous system at single cell resolution (Chiang et al., 2011). At the same time, the impact and potential of many smaller but valuable resources should not be discounted, as they are indispensable to sustain the variety of research approaches that the complexity of the brain demands. Such grass-root distribution underscores the importance of community standards for data formatting, annotating, reporting, and sharing. We predict that the growing availability of data, metadata, and informatics tools will progressively increase the scientific impact of computational modeling and biologically realistic large-scale simulations.

Acknowledgements

We thank Mr. Sean Mackesey from University of California, Berkeley, CA for writing the database ingestion code. We also thank Ms. Wendy Mann and Ms. Joanna Lee from George Mason University Libraries, Data Services Group and the Mason Archival Repository service for hosting the data on the web. Last but not least, we thank Dr. Diek Wheeler and Mr. David Hamilton for reviewing an earlier version of this article.