Big Data Boost for Brain Research

A neuroscientist and a computational scientist walk into a synchrotron facility to study a mouse brain… Sounds like a great set-up for a comedy bit, but there is no punchline. The result is cutting-edge science that can only be accomplished in a facility as scientifically integrated as the U.S. Department of Energy’s (DOE) Argonne National Laboratory.

At a casual, or even a more attentive glance, Doga Gursoy and Bobby Kasthuri would seem at opposite ends of the research spectrum. Gursoy is an assistant computational scientist at Argonne’s Advanced Photon Source (APS), a DOE Office of Science User Facility; Kasthuri, an Argonne neuroscientist.

But together, they are using Argonne’s vast arsenal of innovative technologies to map the intricacies of brain function at the deepest levels, and describing them in greater detail than ever before through advanced data analysis techniques.

Gursoy and Kasthuri are among the first group of researchers to access Theta, the new 9.65 petaflops Intel-Cray supercomputer housed at the Argonne Leadership Computing Facility (ALCF), also a DOE Office of ScienceUser Facility. Theta’s advanced and flexible software platform supports the ALCF Data Science Program (ADSP), a new initiative targeted at big data problems, like Gursoy and Kasthuri’s brain connectome project.

ADSP projects explore and improve a variety of computational methods that will enable data-driven discoveries across all scientific disciplines.

“By developing and demonstrating rapid analysis techniques, such as data mining, graph analytics and machine learning, together with workflows that will facilitate productive usage on our systems for applications, we will pave the way for more and more science communities to use supercomputers for their big data challenges in the future,” said Venkat Vishwanath, ALCF Data Sciences Group Lead.

All about the connections

This new ADSP study of connectomes maps the connections of every neuron in the brain, whether human or mouse. Determining the location of every cell in the brain and how they communicate with each other is a daunting task, as each cellmakes thousands of connections. The human brain, for example, has some 100 billion neurons, creating 100 trillion connections. Even the average mouse brain has 75 million neurons.

To date, neuroscientific discovery has been constrained by current recording and imaging methods, which can sample a limited number of neurons or limited brain volumes. However, through the use of different imaging techniques combined with reconstruction, segmentation, and analysis tools designed for use on Theta, researchers will be able to image and analyze, for the first time, a series of full mammalian brains at the level of every cell and blood vessel.

“This ALCF award targets big data problems and our application of brain imaging does just that,” said Gursoy, assistant computational scientist in the X-Ray Science Division of Argonne’s Advanced Photon Source. “The basic goal is simple—we would like to be able to image all of the neurons in the brain—but the datasets from X-rays and electron microscopes are extremely large. They are at the tera- and petabyte scales. So we would like to use Theta to build the software and codebase infrastructure in order to analyze that data.”

This research was supported by the U.S. Department of Energy’s Office of Science. A portion of the work was also supported by Argonne’s Laboratory-Directed Research and Development (LDRD) program

The process begins with two imaging techniques that will provide the massive sets of data for analysis by Theta. One is at the APS, where full brains can be analyzed at submicron resolution—in this case, the brain of a petite shrewmouse—through X-ray microtomography, a high-resolution 3-D imaging technique. Argonne’s X-ray Sciences Division of the APS provides the expertise for the microtomography research. Much like a CT scanner, it produces images as micro-thin slices of a material whose structure can be meticulously scrutinized. While this resolution provides a detailed picture of blood vessels and cell bodies, the researchers aim to go still deeper.

That depth of detail requires the use of an electron microscope, which transmits a short-wavelength electron beam to deliver resolution at the nanometer scale. This will allow for the capture of all the synaptic connections between individual neurons at small targeted regions guided by the X-ray microtomography.

“For years, scientists at the APS have used these techniques to deepen our understanding of a wide variety of materials, from soil samples to new materials to biological matter,” said Kamel Fezzaa from sector 32-ID at the APS. “By coordinating our efforts with Argonne high-speed computing capabilities for this project, we are able to provide some truly revolutionary images that could provide details about brain functions that we have never before been able to observe.”

Both techniques can produce petabytes of information a day and, according to the researchers, the next generations of both microscopes will increase that amount dramatically.

Images produced by these datasets have to be processed, reconstructed and analyzed. Through the ADSP, Gursoy and Kasthuri are developing a series of large-scale data and computational steps—a pipeline—that integrates exascale computational approaches into an entirely new set of tools for brain research.

Taming of the shrew

The first case study for this pipeline is the reconstruction of an entire adult shrewmouse brain, which, they estimate, will produce one exabyte of data, or one billion gigabytes. And the studies only get bigger from there.

The system architecture and software environment of Theta is accelerating the trend in high-performance computing towards a convergence of data-driven science and traditional simulations, preparing the way for exascale computing.

Each of Theta’s 3,624 nodes is equipped with a KNL 64-core processor which provides 16 gigabytes of high-bandwidth in-package memory, 192 gigabytes of DDR4 RAM, and 128 GB of solid state drive (SSD) storage. In addition to this larger-memory footprint, Theta also supports the software stack and programming environment needed to facilitate data science applications.

“What really differentiates the way research is conducted on Theta is, you’re trying to do new science from data that is not being generated just by simulation. It’s obtained by other means, like imaging or through data mining of journals, which are then used to seed new simulations,” explains Vishwanath.

According to Gursoy, Theta’s multi-core Intel-Cray architecture is both energy efficient and relatively easy to program with rapidly evolving computer languages, such as Python. But before any actual science can begin, the team continues to collaborate with Vishwanath and other ALCF members to explore Theta’s capabilities, increase speed, and assure that their codes work efficiently.

“Then we should be able to configure Theta based on our needs,” he says.

To run their research, they are using Argonne-developed packages, such as TomoPy for reconstruction of 3D volumes, which is based on Python—with little bits of C/C++ for time-critical parts, they note. An open-source language, Python also allows for the use of well-established scientific computing packages to which new and diverse modules can be applied.

In addition, scalable workflows focused on analysis and visualization of experimental data will implement commonly used packages from within the neuroscience community, such as the RhoAna framework. A machine learning-based technique, it develops models based on observed data and uses it later to accelerate the inference and analysis of newly acquired datasets.

“Machine learning will go through these datasets and help come up with predictive models. For this project, it can help with segmentation or reconstruction of the brain and help classify or identify features of interest,” said Vishwanath.

Lessons learned from the smaller shrewmouse brain will be applied to a large mouse brain, which constitutes a 10-fold increase in volume. Comparisons between the two will reveal how organizational structures form during development, from embryo to adult, and how they evolve. The reconstruction of a non-human primate brain, with a volume 100 times larger than a mouse brain, is being considered for a later study.

A neuroscientist and a computational scientist leave a synchrotron facility with studies from a mouse brain . . . armed with new techniques to analyze this data. The images produced by their work will provide a clearer understanding of how even the smallest changes to the brain play a role in the onset and evolution of neurological diseases, such as Alzheimer’s and autism, and perhaps lead to improved treatments or even a cure.