GraphBLAS: Building Blocks For High Performance Graph Analytics

Berkeley Lab Researchers Contribute to GraphBLAS and Will Leverage it for Exascale Applications

Many of us thought linear algebra and graph structures were concepts we’d never again have to deal with after high school. However, these concepts underpin a variety of transactions, from Internet searches to cryptography, artificial intelligence and even operation of the power grid. They are also vital to many computational science and parallel computing applications.

Now after nearly five years of collaboration between researchers in academia, industry and national research laboratories—including Aydın Buluç, a scientist in Lawrence Berkeley National Laboratory’s (Berkeley Lab’s) Computational Research Division (CRD)—GraphBLAS, a collection of standardized building blocks for graph algorithms in the language of linear algebra, is publicly available.

“When people talk about artificial intelligence, big data and data analytics, significant pieces of those come down to graph structures, which is a way of representing relationships between items,” says Tim Mattson, an Intel Senior Principal Engineer and a member of the GraphBLAS collaboration.

In the era of big data, Mattson notes that there is an increasing interest in finding patterns and connections in this information by building graphs and exploring their properties.

“This is a newish application area for scalable supercomputing. Graph problems are fairly straightforward to write down mathematically; however, getting them to work on petabytes of data, on a highly distributed system and in a reasonable amount of time, is actually very difficult,” he adds. “But if we can view graphs as linear algebra problems—which have been central to science and engineering applications in high performance computing for decades—then we can immediately apply everything that we’ve learned from parallel supercomputing over the last 35 years to graphs.”

This is where Buluç’s experience proved to be extremely useful. Buluç began applying linear algebra to high performance graph analysis nearly a decade ago when he was a graduate student at the University of California Santa Barbra (UCSB). For his Ph.D. thesis, he created Combinational BLAS, an extensible distributed-memory parallel graph library offering a small but powerful set of linear algebra primitives specifically targeting graph analytics. This library later partly inspired the bigger GraphBLAS effort. Another CRD Scientist, Ariful Azad, is a major contributor to the Combinatorial BLAS library and graph applications that use Combinatorial BLAS for scalability.

After earning his doctorate, Buluç continued this work at Berkeley Lab as a Luis Alvarez fellow and then as a research scientist. Along the way he also began collaborating with Jeremy Kepner, a Lincoln Laboratory Fellow at the Massachusetts Institute of Technology (MIT). Mattson notes that Buluç and Kepner were driving forces in the modern resurgence to get the community to think about graphs as linear algebra. He connected with both researchers via Buluç’s thesis advisor, UCSB Professor John Gilbert.

“Aydin Buluç was a leader in demonstrating highly scalable implementations of matrix based graph algorithms, his work inspired others to try similar approaches,” says Kepner. “Tim Mattson then championed the idea that a GraphBLAS standard would allow hardware people and software people to work together and magnify our collective efforts.”

According to Mattson, the impetus to create a standard BLAS (Basic Linear Algebra Subprograms) for graph analytics came in 2012 when Intel launched a Science and Technology Center for Big Data at MIT to produce new data management systems and compute architectures for Big Data. As one of the center’s principal investigators, Mattson began building a team of researchers from academic and research institutions across the country with experience in high performance graph analysis, including Buluç, Gilbert and Kepner.

Over the next several years, the collaboration worked to define the mathematical concepts that would go into GraphBLAS. Because this software library was going to be publicly available, it couldn’t be overwhelming for users. So the team aimed to identify the smallest number of linear algebra functions to get the job done. Once the team agreed on the mathematical concepts, a subset of the researchers spent a couple of more years to bind GraphBLAS to the C programming language.

“GraphBLAS is an elegantly small number of functions that are feasible to implement in hardware, as we have demonstrated in the Lincoln Laboratory Graph Processor,” says Kepner. “It allows us to explore graphs with powerful mathematical properties such as associativity, commutativity and distributivity. More recently, GraphBLAS has begun to be of interest to people beyond the graph community, including machine learning, databases, and simulations.”

Thanks to the efforts of Texas A&M Professor Timothy Davis, the GraphBLAS will soon be in many major Linux distributions and in many of the most popular mathematical programming environments in the world. Additionally, hardware manufacturers are starting to build computers specifically designed to accelerate these operations. Kepner notes that the confluence of these efforts will allow millions to enjoy the benefits of GraphBLAS.

“As science problems get bigger, more computing power will be necessary to address these challenges,” says Buluç. “These large-scale applications have many computational components that we call motifs; that’s the whole idea of co-design. Exascale applications are a patchwork of different motifs, and if we optimize all other motifs for exascale and ignore the graph and combinatorics motifs, we’ll hit a performance bottleneck. That’s why this work is so important.”

Basics of BLAS

Many people are familiar with programming languages—like Python, C, C++, Java, and thousands of others—that are used to create a variety of software and applications. These high-level languages make coding relatively easy for humans but make little sense to computer hardware, which only comprehends low-level binary language of ones and zeros. Low-level programs essentially allow the programmer to have more control over how the computer hardware will perform, which means the developer will be able to ensure optimal software performance on a particular machine.

It turns out that only a small number of low-level routines are required to perform most common linear algebra computing operations. So in 1979, researchers at NASA’s Jet Propulsion Laboratory, Sandia National Laboratories and the University of Texas at Austin publicly released Basic Linear Algebra Subprograms (BLAS), a library of these low-level routines for performing linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations and matrix multiplication.

“When BLAS entered the scene in the late 1970s and early 1980s, it was transformative,” says Mattson. “Instead of handcrafting linear algebra algorithms from scratch, I could build them off of these common building blocks. And if a vendor would optimize those common building blocks just for their hardware, I could gain the benefits of that optimization pretty much for free.”

In subsequent years, various research collaborations created a variety of BLAS libraries for different tasks. Realizing the benefits to users, vendors also worked with researchers to optimize these building blocks to run on their hardware. GraphBLAS is essentially a continuation of this BLAS heritage.

“My hope is that GraphBLAS will be just as remarkable for those doing high performance graph analytics,” adds Mattson.

In addition to Buluç, Gilbert, Kepner and Mattson, other members of the GraphBLAS steering committee include David Bader of Georgia Tech and Henning Meyerhenke Karlsruhe Institute of Technology.

Filters

Viruses can infect the microbes residing in, on and around soils, impacting their ability to regulate these global cycles. In Nature Communications, giant virus genomes have been discovered for the first time in a forest soil ecosystem by researchers from the DOE Joint Genome Institute and the University of Massachusetts-Amherst.

Scientists at the National Synchrotron Light Source II (NSLS-II)--a U.S. Department of Energy (DOE) Office of Science User Facility at DOE's Brookhaven National Laboratory--have used ultrabright x-rays to image single bacteria with higher spatial resolution than ever before. Their work, published in Scientific Reports, demonstrates an x-ray imaging technique, called x-ray fluorescence microscopy (XRF), as an effective approach to produce 3-D images of small biological samples.

Microscopes make the invisible visible. And compared to conventional light microscopes, transmission x-ray microscopes (TXM) can see into samples with much higher resolution, revealing extraordinary details. Researchers across a wide range of scientific fields use TXM to see the structural and chemical makeup of their samples--everything from biological cells to energy storage materials.

Oak Ridge National Laboratory researchers invented a way to make a nanomaterial-embedded composite that is stronger than other fiber-reinforced composites and imbued with a new capability--the ability to monitor its own structural health.

Truffles are thought of as dining delicacies but they play an important role in soil ecosystem services as the fruiting bodies of the ectomycorrhizal (ECM) fungal symbionts residing on host plant roots. An international team sought insights into the ECM lifestyle of truffle-forming species through a comparative analysis of eight fungal genomes.

New supercomputer simulations by climate scientists at the Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) have shown that climate change intensified the amount of rainfall in recent hurricanes such as Katrina, Irma, and Maria by 5 to 10 percent. They further found that if those hurricanes were to occur in a future world that is warmer than present, those storms would have even more rainfall and stronger winds.

Stuart Henderson, director of the Department of Energy's Thomas Jefferson National Accelerator Facility, has been appointed the Governor's Distinguished CEBAF professor at Old Dominion University. The position is supported by the Commonwealth of Virginia and is named for the Continuous Electron Beam Accelerator Facility, which is the main research facility located at Jefferson Lab.

The U.S. Department of Energy's (DOE) High Performance Computing for Energy Innovation (HPC4EI) Initiative today issued its first joint solicitation for the High Performance Computing for Manufacturing Program (HPC4Mfg) and the High Performance Computing for Materials Program (HPC4Mtls).

Sierra, Lawrence Livermore National Laboratory's newest supercomputer, rose to second place on the list of the world's fastest computing systems, TOP500 List representatives announced Monday at the International Conference for High Performance Computing, Networking, Storage and Analysis conference (SC18) in Dallas.

Kansas State University has signed an agreement with Westar Energy to provide approximately 50 percent of the energy needs for the university's main Manhattan campus from a wind farm in Nemaha County and save the university nearly $200,000 annually.

Argonne computer scientist Raj Kettimuthu recently was named a Distinguished Member of the Association for Computing Machinery for his development of tools to analyze and enhance end-to-end data transfer performance.

The Department of Energy's Thomas Jefferson National Accelerator Facility now has a few more fellows on campus. The American Physical Society, a professional membership society that works on behalf of the physics community, recently announced its list of 2018 fellowships.