neuromorphic computing – HPCwirehttps://www.hpcwire.com
Since 1987 - Covering the Fastest Computers in the World and the People Who Run ThemThu, 17 Aug 2017 22:33:47 +0000en-UShourly1https://wordpress.org/?v=4.8.160365857KNUPATH Hermosa-based Commercial Boards Expected in Q1 2017https://www.hpcwire.com/2016/12/15/knupath-hermosa-chip-expected-first-half-2017/?utm_source=rss&utm_medium=rss&utm_campaign=knupath-hermosa-chip-expected-first-half-2017
https://www.hpcwire.com/2016/12/15/knupath-hermosa-chip-expected-first-half-2017/#respondThu, 15 Dec 2016 16:00:50 +0000https://www.hpcwire.com/?p=32730Last June tech start-up KnuEdge emerged from stealth mode to begin spreading the word about its new processor and fabric technology that’s been roughly a decade in the making. It’s nice to have patient capital, a rare commodity for startups these days. The company contends its KNUPATH Hermosa processor with 256 DSP cores and its […]

]]>Last June tech start-up KnuEdge emerged from stealth mode to begin spreading the word about its new processor and fabric technology that’s been roughly a decade in the making. It’s nice to have patient capital, a rare commodity for startups these days. The company contends its KNUPATH Hermosa processor with 256 DSP cores and its Lambda fabric will bring performance, scalability, energy, and programmability advantages over CPUs, GPUS, and FPGAs to a wide swath of machine learning applications. The first commercial boards – code named Mavericks – are expected around March this year.

Founded in the 2005 timeframe by Daniel Goldin, the long time NASA administrator, KnuEdge has raised roughly $100M no doubt stemming from investor confidence in Goldin’s extensive technology creation and delivery history. Goldin and company believe their investors’ patience is about to start paying off. KnuEdge has two business units, KNUPATH focused on hardware accelerators based on Hermosa and Lambda technology, and KnuVerse, focused on voice and face recognition systems. The latter, said Steve Cumings, CMO, KnuEdge, has customers in the government sector. Company revenues are somewhat north of $20 million so far.

Broadly, KnuEdge’s view is that a highly scalable processor in a single socket is handicapped in addressing growing machine learning and large-scale computing challenges. In contrast, the company’s Lambda Fabric enables a large number of “KNUPATH Hermosa processors to be interconnected in low latency, high throughput mesh for massively parallel processing which is well suited for application needs that will drive the compute engines of the future.”

This isn’t exactly a new idea. The Hermosa chip and Lambda technology will enter the market amid a gush of machine learning technologies all striving to advance data-driven science and enterprise data analytics. Indeed the emergence of heterogeneous computing architectures relying on a variety of accelerator engines is a key feature of today’s computing landscape. Given Goldin’s remarkable achievements at NASA it should be interesting to watch KnuEdge’s progress.

Early developer boards with two Hermosa chips have been available for some time. Volume sales of individual chips are planned to begin in January followed by the Mavericks offering, a PCIe board with four Hermosa chips, towards the end of the quarter.

Presented as a “neural computing” approach, the KNUPATH architecture actually attempts to mimic nervous system communication more than brain-inspired spiky neuron ‘inference logic’ (discussed further below).

Patrick Patla, senior vice president and general manager of KNUPATH and a former AMD executive, said, “What’s unique about Hermosa’s 256 DSP cores is that they are hooked together at a central part of the processor with a router that has 16 ports. Using the Lambda fabric, it’s possible, at least theoretically, to scale to 500,000 Hermosa processors.

“We are a data flow machine. So you push data through the system and can have the calculation and different algorithms change on the fly. We are different than a GPU accelerator in that they use a SIMD architecture. We use multiple programs, multiple data, so on our 256 cores we could have 256 separate algorithms running. You would push data through those algorithms and then you have hits on the data at different hit rates based on the algorithms and you can tune and resend algorithms to those DSPs through packets,” explained Patla.

“Basically the packets that we send through the Lambda network are what allows the programming of the DSP, so packets deliver the program, the algorithm, and then bring the payload, and push the data through it. Not only are you getting all the data and the operating instructions with each packet, but each core also knows the next destination for that information so it’s extremely efficient.” One result is very low latency at various systems levels (see diagram below).

Patla also contrasted Hermosa’s ease of use with emerging brain-inspired neuromorphic chips such as IBM’s TrueNorth, which uses “spiking neuron” architecture.

“Spiky algorithms are notoriously difficult to program. Commonly they are trained on other networks first and then moved onto the neuromorphic chip so the actual software side of that is different,” he said.

As noted earlier the Hermosa-Lambda architecture emulates neuronal connectivity more than brain processing. “If you look at the different neuron-based approaches, our inspiration really gives you lots of little engines – that’s the background of the DSP cores, what we affectionately sometimes call tDSPs or tiny DSPs,” said Patla. Reliance on familiar DSP architecture eases programming.

“Our tools sit on a C/C++ library set on top of LLVM (compiler). And everybody is familiar with OpenCL as well as OpenMPI which is very comfortable in our architecture,” said Patla. The Hermosa/Lambda architecture also supports NUMA (non uniform memory access) and each processor has memory directly (72MB) on it. “Much of the advantage is the dataflow but also all the advantages of common programming techniques for anybody that has worked on OpenMPI. Many of the other [neuromorphic] architecture require a different set of tools.”

Hermosa Development Board

KnuEdge has had a software developer kit out for “quite some time” and it is already in the hands of many developers, according to Patla.

It all sounds great. In April KnuEdge will hold a Hermosa developers’ conference at UCSD as well as a “heterogeneous neural network conference” in partnership with UCSD for the development of next generation algorithms that can take advantage of new architectures such as Hermosa. Patla said performance benchmarks for chip will be forthcoming with the release of the commercial product; it seems like the developer conference would be a good place to do so, but he wouldn’t specify when beyond the first half of the year.

“Right now, as you would imagine, we are in the labs with our SDKs and final verification of those commercial systems as we are tuning and bringing all of our code to the processors. In the future we’ll show configurations of 4, 8, 12, 16, Hermosas together to show the scalability of the Lambda fabric. When Steve talked about mimicking the nervous systems it really is about our connectivity and the fact that when you add more Hermosas to the network, we continue to scale because with every socket you are adding more memory as well. Each processor has 72MB of onchip memory that is sufficient for the programming of our kinds of algorithms and the workload we are trying to tackle.”

Currently the chip is being fabbed by GLOBALFOUNDARIES on the 32nm process. “It’s a well behaved chip where these 256 cores and fabric and everything lives in a 35-watt part,” said Patla.

The KNUPATH folks believe Hermosa has the potential to meet a wide variety of machine-learning kinds of applications performed in heterogeneous computing environments as well as an opportunity to replace existing approaches to those applications.

‘We have a demo on the website that compares us to the most current NVIDIA card and we have a 2.5x performance. It is very interesting that a video card isn’t very good at video compression that we are good at because of the parallelism of communication we handle across the memory. So that’s one of the spaces we’ll be aiming at. And of course it will also find its way into many of the single board computer spaces because at 35 watts and the ability to do signal processing and such fine grained computing we actually expect it to replace many FPGAs in a lot of environments.”

Patla argues Hermosa/Lambda’s flexibility is a major benefit and door opener – one could divvy the chips up and have a multipurpose SOC instead of dedicating it to just one task. He used a video analysis application as an example of flexibility and reprogrammability.

“You can reprogram a core by just delivering a new packet. For example, if you were doing video analysis and were searching within videos, you could be looking for ball caps. You could have all the different algorithms looking at ball caps and you could just all of a sudden reprogram and divide the chip and have 25 percent of the chip looking for red ball caps and 25 percent looking for blue caps. You could flip to four different algorithms in nanoseconds. Then when you have high hit rates and you realize the one you are really looking for, and you could say OK now all care about our green ball caps and that algorithm would propagate against all the cores and you’d be able to take your throughput up. It’s very fast, very flexible,” he said.

At SC16, the KNUPATH team was busily evangelizing. Patla said they talked to a number of cloud providers as well as national labs that expressed interest to the point that he is expecting some new workloads to emerge.

There’s still much to do. Patla ticked off desirable milestones for 2017 – getting out of the lab, showcasing a couple of commercial customers and workloads, integrating the many machine learning frameworks, making sure Hermosa-based systems get into the cloud somewhere for development and production purposes, to name but a few.

]]>Efforts to emulate signaling produced at nerve cell synapses aren’t new. Many different approaches – CMOS circuits and ‘ionic-drift’ based memristor technology, for example – have been tried, all with various shortcomings. Last week, researchers from UMass, Loughborough University, Hewlett Packard Labs, and Brookhaven National Laboratory reported a new approach that closely mimics the Ca2+ diffusion dynamics that occur at synapses between human nerve cells.

Their work reported in Nature Materials (Memristors with diffusive dynamics as synaptic emulators for neuromorphic computing, Sept 26, 2016, online) could lead to new applications in neuromorphic computing. “In addition to providing a synapse emulator, the diﬀusive memristor can also serve as a selector with a large transient nonlinearity that is critical for the operation of a large crossbar array as a neural network. The results here provide an encouraging pathway toward synaptic emulation using diﬀusive memristors for neuromorphic computing,” report the researchers in the paper’s conclusion.

They note, “[CMOS] circuits have been employed to mimic synaptic Ca2+ dynamics, but three-terminal devices bear limited resemblance to bio-counterparts at the mechanism level and require significant numbers and complex circuits to simulate synaptic behavior. A substantial reduction in footprint, complexity and energy consumption can be achieved by building a two-terminal circuit element, such as a memristor directly incorporating Ca2+-like dynamics.”

Other efforts based on ionic drift have been used and, “Although qualitative synaptic functionality has been demonstrated, the fast switching and non-volatility of drift memristors optimized for memory applications do not faithfully replicate the nature of plasticity,” according to the paper.

Quoted in an article on nanotechweb.org (Memristor behaves like a synapse), lead author J. Joshua Yang of UMass said, “In our memristors, we looked at how metallic atoms, like silver (Ag) or copper (Cu), diffuse through dielectric oxide materials. The way these metals diffuse through a dielectric is very similar, physically, to the way Ca2+ diffuses through channels in biological synapses.” Perhaps most important, the memristors created by the researchers exhibited relaxation after being turned on.

As described in the article, researchers made their silver-in-oxide memristors with two Pt or Au inert electrodes sandwiching a switching layer of a dielectric film with embedded Ag nanoclusters. “The device is essentially a volatile memristor where Ag atoms diffuse under the influence of electrical bias,” said Yang. “This electrical stimulation turns on the devices to their low resistance state, forming a nanoscale conduction channel of Ag (around 4 nm in diameter). When the electrical bias is removed, the device spontaneously relaxes back to its high resistance state thanks to the silver channel reshaping into spherical silver clusters.”

In the concluding section of their paper the researchers write: “[W]e have constructed and demonstrated a new class of memristors as synaptic emulators that function primarily on the basis of diﬀusion (rather than drift) dynamics…The Ag dynamics of the diﬀusive memristors functionally resemble the synaptic Ca2+ behavior in chemical synapses and lead to a direct and natural emulation of multiple synaptic functions for both short-term and long-term plasticity…”

]]>Deep learning efforts today are run on standard computer hardware using convolutional neural networks. Indeed the approach has proven powerful by pioneers such as Google and Microsoft. In contrast neuromorphic computing, whose spiking neuron architecture more closely mimics human brain function, has generated less enthusiasm in the deep learning community. Now, work by IBM using its TrueNorth chip as a test case may bring deep learning to neuromorphic architectures.

Writing in the Proceedings of the National Academy of Science (PNAS) in August (Convolutional networks for fast, energy-efficient neuromorphic computing), researchers from IBM Research report, “[We] demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, perform inference while preserving the hardware’s underlying energy-efficiency and high throughput.”

The impact could be significant as neuromorphic hardware and software technology have been rapidly advancing on several fronts. IBM researchers ran the datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per watt). They report their approach allowed networks to be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. Basically, the new approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors.

As Hsu points out in the IEEE Spectrum article, “Deep-learning experts have generally viewed spiking neural networks as inefficient – at least, compared with convolutional neural networks – for the purposes of deep learning. Yann LeCun, director of AI research at Facebook and a pioneer in deep learning, previously critiqued IBM’s TrueNorth chip because it primarily supports spiking neural networks. (See IEEE Spectrum’s previous interview with LeCun on deep learning.)

“The IBM TrueNorth design may better support the goals of neuromorphic computing that focus on closely mimicking and understanding biological brains, says Zachary Chase Lipton, a deep-learning researcher in the Artificial Intelligence Group at the University of California, San Diego. By comparison, deep-learning researchers are more interested in getting practical results for AI-powered services and products.”

IBM is trying to widen that perspective. Clearly, understanding brain function better is an important element neuromorphic computing research but so, increasingly, is developing real-world applications. Lawrence Livermore National Laboratory has purchased a True-North-bases system to explore and in Europe the Human Brain Project has opened up its two big machines, SpiNNaker at Manchester University, U.K., and BrainSaleS in Germany to researchers to develop applications and explore neuromorphic computing.

The IBM paper authors describe the traditional deep learning challenge well: “Contemporary convolutional networks typically use high precision (32-bit) neurons and synapses to provide continuous derivatives and support small incremental changes to network state, both formally required for back-propagation-based gradient learning. In comparison, neuromorphic designs can use one-bit spikes to provide event-based computation and communication (consuming energy only when necessary) and can use low-precision synapses to co- locate memory with computation (keeping data movement local and avoiding off-chip memory bottlenecks).”

By introducing two constraints into the learning rule – binary-valued neurons with approximate derivatives and trinary-valued synapses – the researchers say it is possible to adapt backpropagation to create networks directly implementable using energy efficient neuromorphic dynamics.

“For structure, typical convolutional networks place no constraints on filter sizes, whereas neuromorphic systems can take advantage of blockwise connectivity that limits filter sizes, thereby saving energy because weights can now be stored in local on-chip memory within dedicated neural cores. Here, we present a convolutional network structure that naturally maps to the efficient connection primitives used in contemporary neuromorphic systems. We enforce this connectivity constraint by partitioning filters into multiple groups and yet maintain network integration by interspersing layers whose filter support region is able to cover incoming features from many groups by using a small topographic size,” write the researchers whose project was funded by DAPRA as part of its Cortical Processor program aimed at brain-inspired AI that can recognize complex patterns and adapt to changing environments,” write the researchers.

Shown below is a figure of both conventional convolutional network and the TrueNorth approach.

Fig. 1.(A) Two layers of a convolutional network. Colors (green, purple, blue, orange) designate neurons (individual boxes) belonging to the same group (partitioning the feature dimension) at the same location (partitioning the spatial dimensions). (B) A TrueNorth chip (shown far right socketed in IBM’s NS1e board) comprises 4,096 cores, each with 256 inputs, 256 neurons, and a 256 × 256 synaptic array. Convolutional network neurons for one group at one topographic location are implemented using neurons on the same TrueNorth core (TrueNorth neuron colors correspond to convolutional network neuron colors in A), with their corresponding filter support region implemented using the core’s inputs, and filter weights implemented using the core’s synaptic array.(C) Neuron dynamics showing that the internal state variable V(t) of a TrueNorth neuron changes in response to positive and negative weighted inputs. Following input integration in each tick, a spike is emitted if V(t) is greater than or equal to the threshold θ=1. V(t) is reset to 0 before input integration in the next tick. (D) Convolutional network filter weights (numbers in black diamonds) implemented using TrueNorth, which supports weights with individually configured on/off state and strength assigned by lookup table. In our scheme, each feature is represented with pairs of neuron copies. Each pair connects to two inputs on the same target core, with the inputs assigned types 1 and 2, which via the look up table assign strengths of +1 or −1 to synapses on the corresponding input lines. By turning on the appropriate synapses, each synapse pair can be used to represent −1, 0, or +1.

In the IEEE article, Modha notes TrueNorth’s general design as an advantage over those of more specialized deep-learning hardware designed to run only convolutional neural networks because it will likely allow the running of multiple types of AI networks on the same chip. He’s quoted saying, “Not only is TrueNorth capable of implementing these convolutional networks, which it was not originally designed for, but it also supports a variety of connectivity patterns (feedback and lateral, as well as feed forward) and can simultaneously implement a wide range of other algorithms.”

In their paper, the authors emphasize that their work demonstrates more generally that “the structural and operational differences between neuromorphic computing and deep learning are not fundamental and points to the richness of neural network constructs and the adaptability of backpropagation. This effort marks an important step toward a new generation of applications based on embedded neural networks.” It’s bet to read the paper in full for details of the work.

]]>https://www.hpcwire.com/2016/09/29/ibm-advances-neuromorphic-computing-deep-learning/feed/030528Larry Smarr Helps NCSA Celebrate 30th Anniversaryhttps://www.hpcwire.com/2016/09/20/larry-smarr-helps-ncsa-celebrate-30th-anniversary/?utm_source=rss&utm_medium=rss&utm_campaign=larry-smarr-helps-ncsa-celebrate-30th-anniversary
https://www.hpcwire.com/2016/09/20/larry-smarr-helps-ncsa-celebrate-30th-anniversary/#respondTue, 20 Sep 2016 20:34:18 +0000https://www.hpcwire.com/?p=30367Throughout the past year, the National Center for Supercomputing Applications has been celebrating its 30th anniversary. On Friday, Larry Smarr, whose unsolicited 1983 proposal to the National Science Foundation (NSF) begat NCSA in 1985 and helped spur NSF to create not one but five national centers for supercomputing, gave a celebratory talk at NCSA.

]]>Throughout the past year, the National Center for Supercomputing Applications has been celebrating its 30th anniversary. On Friday, Larry Smarr, whose unsolicited 1983 proposal to the National Science Foundation (NSF) begat NCSA in 1985 and helped spur NSF to create not one but five national centers for supercomputing, gave a celebratory talk at NCSA. In typical fashion, Smarr not only revisited NCSA’s storied past, spreading credit liberally among collaborators, but also glimpsed into scientific supercomputing’s future saying, “This part is on me”

Many of his themes were familiar but a couple veered off the beaten path – “The human stool” (yes that stool) said Smarr “is the most information-rich material you have ever laid eyes on.” Its enormous data requirements will “dwarf a lot of our physics and astronomy as we really get to precision medicine and that means we are going to need a lot more computer time.” More on this later, replete with metrics and why deciphering the microbiome will require supercomputing.

Here are few of the topics Smarr sailed through:

NSF Uniqueness.

Big Data and the Rise of Neuromorphic Computing

Scientific Visualization.

Exascale with and without Exotics

Why the Microbiome is Important.

Artificial Intelligence is Coming. Soon.

NCSA, based at the University of Illinois, Urbana-Champaign (UIUC), is a U.S. supercomputing treasure. Its current flagship, Blue Waters from Cray, is roughly fifty million times faster than the original Cray X-MP machine that Smarr and his team installed at NCSA’s ambitious start. Even the floor housing the first Cray required $2M in renovations, kicked in by UI. It was a big undertaking to say the least. Since then Blue Waters and its lineage have handled a wide variety of academic and government research, broken new ground in scientific visualization, and promoted industrial collaboration.

Larry Smarr

Smarr, of course, was NCSA’s first director. Today, he is director of California Institute for Telecommunications and Information Technology (Calit2), a UC San Diego/UC Irvine partnership. An astrophysicist by training, his work spans many disciplines and is currently focused on the microbiome; the common thread is his drive to use of supercomputing to solve important scientific problems. (Currently Bill Gropp is acting NCSA director and Ed Seidel, the current director, has stepped up to serve as interim vice president for research for the University of Illinois System.)

Smarr recalled his “ahha moment” that supercomputers should be more widely available and applied in science. He was at UIUC, busily applying computational methods to astrophysics, most famously his effort to solve general relativity equations for colliding black holes using numerical methods, an approach many colleagues thought a fool’s errand. Last year’s LIGO results proved dramatically otherwise. (See HPCwire article, story Gravitational Waves Detected! Historic LIGO Success Strikes Chord with Larry Smarr)

At the time, UIUC had a “VAX 11/780 and the VIP, the “VAX and Image Processing facility, which was about as good as any professors had in the country,” recalled Smarr. He had the chance to go to the Max Planck Institute to work with Karl-Heinz Winkler and Mike Norman and their supercomputer, a Cray 1. “Code that had taken 8 hours on the VAX, overnight – that’s the rate of progress you could make, one 8-hour run a night – I put on the Cray started to go off to lunch.” Before he left the room, the job finished. “I said that’s not possible.” But it was. The Cray 1 was about 400x faster, changing an 8-hour VAX run into a one minute Cray run. “Every ten minutes I could make the same scientific progress that I was making every day. That was the ahaha moment.”

The rest is supercomputing history. Encouraged by Rich Isaacson, NSF’s Division Director for gravitational research, Smarr’s 1983 proposal percolated though NSF culminating with the award in 1985. Perhaps not surprisingly, the Max Planck open-access approach was the model, with Illinois cloning Lawrence Livermore’s machine room. Smarr emphasized many voices and individual efforts were involved in bringing NCSA to fruition. His talk briefly covered supercomputing’s past, present, and future – with many colorful anecdotes. NCSA has posted a video of Smarr’s full talk; a link is included at the end of this article.

NSF Matters…and So Does Risk Taking

Early in his talk, Smarr paid tribute to NSF. NCSA and its four siblings represented one of NSF’s big bets. The LIGO program (Laser Interferometer Gravitational-wave Observatory (LIGO) was perhaps the longest and most expensive individual NSF-funded program and also a huge risk. Both are delivering groundbreaking science. Taking on big risk-big reward projects is something NSF can and should do. We probably don’t do enough of them today, he suggested.

He recalled that when Isaacson encouraged him to submit the ‘NCSA’ proposal, Smarr responded, “But there is no program at NSF for this and Isaascon said, at ‘NSF we believe in proposal pressure from the community.’”

NCSA switched from specially designed Cray’s to microprocessor based machines from SGI in 1995, another big bet on a new idea. Global demand for microprocessors was growing a whole lot faster than the demand by “the few hundreds of people that bought Crays.” Smarr and NCSA, backed by NSF-funding, bet on microprocessors for the next machine in what he calls a historic shift.

“We’d be about a 10,000 times slower today [if we had not chosen microprocessors]. It is this ability to take risks based on your knowledge of where the technology is going that has made all the difference,” he said. “The NSF is unique in my view in the world in continually working at the outer edge, driven by the best ideas that come out of the user community, and then those breakthroughs are very well coupled back into the corporate world.”

Since today we have smart phones whose processing power far exceeds early supercomputers, there are some who contend NSF’s supercomputer support must be done. Hardly, says Smarr. Rather, “NSF just keeps moving the goal lines exponentially ahead of the consumer market and that is one of the most important things that keeps the United States in its competitive position worldwide.”

I See You – Insight from Sight

Even at the start of computing, he said, John von Neumann understood the need to make results more readily understandable. “In the early days, when computers were at about a floating point operation a second (FLOPs), von Neumann said they would generate so much data that it would overwhelm the human mind and so we needed to turn the data stream flowing from the computer into a visualization by running the output of the computer into an oscilloscope. So this idea was there from the very beginning, but NCSA took it to a whole another level.”

Scientific visualization has jumped way beyond oscilloscopes. Think 3D immersion CAVE environments, and more, said Smarr citing the NCSA-Caterpillar collaboration. “Caterpillar drove [technology advance] by their investments in NCSA and interest in using virtual reality to create working models in virtual reality of their new earth moving machines before they were built, just out of the CAD/CAM drawing. They were actually worked with us to show how you could have a global set of Caterpillar people working on details like where do we put the fuel tank opening and operator visibility.”

The idea of visualization is not pretty pictures; it’s insight. If you’ve got a computer “doing in those days a few billion 13-digit multiplies a second, which of those numbers do you want to look at to get that understanding? So the idea of scientific visualization was actually an intermediary technology to the human eye-brain system, the best pattern recognition computer yet.”

Of course, that doesn’t preclude pretty pictures that are content rich. Smarr cited NCSA alum, Stefan Fangmeier, who took ideas nurtured at NCSA to Industrial Light & Magic showing that science, not just an artist’s imagination, could be used to convey information: resulting in the computer graphics seen in films such as “Twister, Jurassic Park, Terminator, Perfect Storm, and so forth.”

The staggering growth of data will require ever improving visualization techniques the make insight more readily accessible.

Brain-Inspired Computing Architectures

We’ll probably get to exascale computing using traditional architecture, thought Smarr. But to make sense of the tremendous data deluge as well as to progress in deep learning (et al.) better pattern recognition technology will be required. Brain-inspired computing is a new source of inspiration and perhaps further along that many realize. A hybrid computing architecture is likely to emerge, mimicking in a way the so-called human right/left brain dichotomy.

“We are in a moment of transition in which data science and data analysis is becoming as important if not more important than traditional supercomputing,” said Smarr. New approaches beyond today’s cloud computing are needed and brain-inspired co-processors looks prominent among them.

“To research this new paradigm, Calit2 has set up a Pattern Recognition Lab (PRL) to bring this whole new generation of non-von Neumann processors in, put them in the presence of GPUs and Intel multicores to handle the general purpose stuff, then [porting] all the different machine learning algorithms onto them, optimizing them for a very wide set of applications.”

He’s hardly alone in this thinking and cited other suchs as Horst Simon and Jack Dongarra who’ve voiced similar opinions. He singled out IBM’s True North neuromorphic chip, the first non-von Neumann chip in the Calit2 PRL, that put a million neurons and 256 million synapses in silicon, “the most components [on a] chip IBM has ever fabbed.” Lawrence Livermore National Laboratory – “whose supercomputer machine room we cloned, explicitly to make NCSA” –bought a 4X4 array of these neuromorphic chips and is collaborating with IBM to build a brain inspired supercomputer that will be used in deep learning.

Most recently the PRL has added a radical new chip architecture produced by a San Diego startup. Smarr helped to recruit Dan Goldin, the longest serving NASA administrator, to La Jolla, CA over ten years ago to do a startup, (KnuEdge). “This isn’t your typical startup-Dan is now in his mid-70s. But ten years ago Dan spent two years in the Neuroscience Institute to figure out how to put into silicon what they had learned about how the brain learns.” Dan then worked with Calit2 to prototype the first design of a computer board.

In June 2016, KnuEdge came out of stealth with its Hermosa chip. It’s a multilayer “cluster of digital signal processors that don’t have a clock, so it is asynchronous. Their Lambda Fabric is a completely different architecture than what we’re used to working with. That is now in our PRL,” said Smarr.

One of the brain’s advantages everyone is chasing is low power consumption. “Biological evolution has figured out how to get a computer to run a million times more energy efficient than an Exascale will run at and we cannot throw that away kind of advantage. So what I have been saying for 15 years is we’re going to have a new form of computer science and engineering emerge which abstracts out of biologically evolved entities what the principles of organization of those ‘computers’, if you like, are which is totally different than engineered computers.” (See HPCwire article, Think Fast – Is Neuromorphic Computing Set to Leap Forward?)

The Microbiome, Precision Medicine and Computing

Research in recent years has shown how important the microbiome – the population of bacteria in each of us – is to health. If genes and gene products are the key players in physiology, then the numbers tell the microbiome’s story. Inside most people there are around 10x more DNA-bearing bacteria cells than human DNA-bearing cells and 100x more bacteria genes of the microbial DNA than in the human DNA. What’s more the mix of species and their relative proportions inside a persons matter greatly.

This is the “dark matter” of healthcare, said ex-cosmologist Smarr, and our efforts to understand and use the microbiome “will be completely transformative to medicine over the next five to ten years,” thinks Smarr and others agree. There is even a U.S. Presidential initiative Microbiome Project in addition to the U.S. Precision Health Initiative. Understanding the microbiome and effectively using it will require sequencing and regular monitoring – think time series experiments – of related biomarkers.

It turns out Smarr has been doing this on himself and discovered he has a gene variant which inclines him to Inflammatory Bowel Disease, which may in the future be treated by “gardening your microbiome’s ecology”. Skipping some of the details, the computational challenge is immense. His team started several years ago with a director’s discretionary grant on Gordon, provided by SDSC director Mike Norman. “Our team used 25 CPU-years to compute comparative gut microbiomes starting from 2.7 trillion DNA bases of my samples along with healthy and IBD subjects.”

He compared this work to his early work in the 1970s on general relativistic black hole dynamics, which took several hundred hours on a CDC 6600 versus the 800,000 or so core hours he, UCSD’s Rob Knight and their team is currently using on San Diego Supercomputing Center’s Comet working on microbiome ecology dynamics. Performing this kind of analysis on a population-wide scale, on an ongoing basis, is a huge compute project. There are 100 million times as many bacteria on earth as all the stars in the universe, noted Smarr, quoting Professor Julian Davies that once the diversity of the microbial world is cataloged, it will make “astronomy look like a pitiful science.”

All netted down, he said “Living creatures are information entities, working their software out in organic chemistry instead of silicon, and that information is your DNA, but it’s both in your human and the microbes’ DNA. When you want to read out the state of that person you need to look at time series of the biomarkers in your blood and stool. If that’s going to be the future and my job has always been to live in the future, then I should turn my body into a biomarker and genomics “observatory” and I started taking blood measurements and stool measurements periodically.

“Your stool by the way doesn’t get much respect – we’ve got to work on our attitude a little because stool is 40 percent microbes and 1 gram of stool contains 1 billion microbes, each of which has a DNA molecule 3-5million bases long. So it’s the most information rich material you have ever laid eyes on.”

You get the idea.

Preparing for Artificial Intelligence

Smarr’s last slide, shown below, contained a set of ominous quotes on the dangers of artificial intelligence from Steven Hawking, Bill Joy, Elon Musk, and Martin Rees – names familiar to most of us and all people whom Smarr knows. He didn’t dwell on the dangers, but directly acknowledged they are real. He spent more time on why he thinks AI is closer than we may realize, how it can be beneficial, and suggested one way to prepare is for NSF to start stimulating thought on AI issues in youth and young scientists.

The technology itself is advancing on many fronts, whether running machine learning on traditional CPU/GPUs or emerging neuromorphic (Smarr didn’t discuss quantum computing in his talk). He noted that LBNL’s Deputy Director Horst Simon predicts that in the 2020-2025 timeframe, an exascale supercomputer will be able to run a simulation of 100% of the scale of the human brain in real time. “It will be effectively as fast as a human brain,” said Smarr. What that means in terms of applications and AI precisely remains unclear. But the technology will get us there.

Today, everyone’s favorite example of that state of machine learning as a surrogate for AI seems to be Google DeepMind system’s recent victory over Lee Sedol of Korea, one of the world’s best Go champions this spring.

“Google took 30 million moves of the best Go masters on the planet and fed those in as training sets. That [alone] would have made a computer hold its own against top Go players. But then Google’s team ran the trained AI against itself for millions of times coming up with moves of Go that no human had ever conceived of,” said Smarr, “So in less than two years from when Wired magazine ran a story titled ‘Go, the Ancient Game That Computers Still Can’t Win,’ Google [won].”

“Then Google takes that incredible software, what a treasure trove, and makes it open source and gives it to the world community in TensorFlow. We are using this every day at Calit2 to program these new chips (KnuEdge).” A research effort Smarr cited, being led by Jeremy Howard, is attempting to teach machines to read medical xrays as well as ‘the best doctor in the world “using TensorFlow. Howard says basically instead of programming the computer to do something, you give it a few examples and let it figure out how to do that. That’s the new paradigm.”

In fact, there are many aggressive efforts to develop the new paradigm and many of those efforts involve corporate IT giants advancing AI for their own purposes and putting their technology into the hands of academia for further development, pointed out Smarr. IBM is “betting the farm on Watson”. All of the new systems will not merely be powerful but hooked into vast databases.

For a feel of where this is going, consider the movie Her. “All of you should see it if you want to experience one of the best examples of speculative fiction painting a picture of where this process is taking us, where we all have individualized personalized AI agents, who learn more and more about you the more you interact with the [system]. And they are working with everybody across the planet simultaneously,” said Smarr.

Sounds very Big Brother-ish, and it could be agrees Smarr. However he remains optimistic. Like many of his generation, he grew up reading science fiction including Isaac Asimov’s many robot-themed works.

“Asimov had the three laws to protect the robots from doing harm to humans. We’ll get through this AI transition I believe, but only if everybody realizes this is a one of the most important change moments in human history, and it isn’t going to be happening 100 years from now, but rather it’s going to be in the next five, 10, to 20 years. One of the things I am hoping is NSF will be funding a lot of this research into the universities and to young people where they can start imagining these futures, playing with these new technologies, and helping us avoid some of the risks that these four of the smartest people on the planet are talking about here. My guess is that NCSA and the University of Illinois at Urbana-Campaign will be leaders in that effort,” concluded Smarr.

]]>https://www.hpcwire.com/2016/09/20/larry-smarr-helps-ncsa-celebrate-30th-anniversary/feed/030367Think Fast – Is Neuromorphic Computing Set to Leap Forward?https://www.hpcwire.com/2016/08/15/think-fast-neuromorphic-computing-racing-ahead/?utm_source=rss&utm_medium=rss&utm_campaign=think-fast-neuromorphic-computing-racing-ahead
https://www.hpcwire.com/2016/08/15/think-fast-neuromorphic-computing-racing-ahead/#respondMon, 15 Aug 2016 12:00:34 +0000https://www.hpcwire.com/?p=29346Steadily advancing neuromorphic computing technology has created high expectations for this fundamentally different approach to computing. Its strengths – like the human brain it attempts to mimic – are pattern recognition (space and time) and inference reasoning. Advocates say it will also be possible to compute at much lower power than current paradigms. At ISC […]

]]>Steadily advancing neuromorphic computing technology has created high expectations for this fundamentally different approach to computing. Its strengths – like the human brain it attempts to mimic – are pattern recognition (space and time) and inference reasoning. Advocates say it will also be possible to compute at much lower power than current paradigms. At ISC this year Karlheinz Meier, a physicist-turned-neuromorphic computing pioneer and a leader of the European Human Brain Project (HBP), gave an overview that served as both an update and primer.

Moving to neuromorphic computing architectures, he believes, will make emulating brain function not only more effective and efficient, but also eventually accelerate computational learning and processing significantly beyond the speed of biological systems opening up new applications. Sometimes it’s best to start with the conclusions: Meier offered these summary bullets describing the state of neuromorphic computing today (not all covered in this article) at the end of his talk:

After 10 years of development available hardware systems have reached a high degree of maturity, ready for non-expert use cases

High degree of configurability with dedicated software tools, but obviously no replacement for general purpose machines

Only way to access multiple time scales present in large-scale neural systems, making them functional

Well suited for stochastic inference computing

Well suited for use of deep-submicron, non-CMOS devices

Meier is predictably bullish, citing the already proven power of deep learning and cognitive computing while still using traditional computer architectures by players like Google, Baidu, IBM, and Facebook. Indeed, computer-based neural networking isn’t new. It’s a mainstay in a wide variety of applications, typically assisted by one or another type of accelerator (GPU, FPGA, etc). Notably, NVIDIA launched its ‘purpose-built’ deep learning development server this year, basically an all GPU machine.

Many of these cognitive computing/deep learning efforts on traditional machines are quite impressive. Google’s AlphaGo algorithm from subsidiary DeepMind handily beat the world Go champion this year. But simply adapting neural networking to traditional to von Newmann architectures has drawbacks. One is power – not that it’s bad by conventional standards – but it shows no sign of the being able to approach the tiny 20W or so requirement for the human brain versus megawatts per year for supercomputers.

Also problematic is the time required to train networks. Talking about the AlphaGo victory, which he lauds, Meier said, “What people don’t see or what Google doesn’t tell people is that it took something like a year to train this system on a big cluster of graphic cards, certainly several hundred kilowatts of power over a very long time scale, many, many months. Of course the system looked at many Go games to discover the rules and structure and to play very well.”

Let’s acknowledge the training problem persists in neuromorphic computing as well. That said, neuromorphic, or brain inspired, computing seeks to mimic more directly how the human brain works. In the brain, neurons, the key components of brain processing, are connected in a vast network of networks. Individual neurons typically act in what’s called an integrate-and-fire fashion – that’s when the neuron’s membrane potential reaches a threshold and suddenly fires. Reaching that potential may involve numerous synaptic inputs that together sum to cross the firing threshold.

One of the staggering aspects of the brain is the range of physical size and ‘event’ durations it encompasses. In rough terms from tiny synapses to the whole brain there are seven orders of spatial magnitude, noted Meier, and in terms of time there are eleven orders of magnitude spanning activities from neuron firing to long-term learning. “Typically brains consist of neurons that spike and produce these kinds of action potentials, which are at the millisecond or sub millisecond level. And as you all know the time to learn things is months to years,” he said.

There have been successful efforts to map neural networks onto a supercomputer. One such effort on Japan’s K computer deployed a relatively simple network (~ one percent of the brain) and ran 1500X slower than the brain. This early work on the K computer by Markus Diesmann, another leader in European Human Brain Project (HPB), was the largest neural net simulation to date and an impressive achievement. However, it was a far cry from the efficiency (energy or processing capability) of the human brain.

“You have to wait four years for a single simulated day. A day is nothing in the life of a brain. If you consider how you learn, real rewiring the structure of the brain, which takes many, many years at the beginning of your life, these time scales are inaccessible on conventional computers. And that will not change if you just go to exascale [on traditional architectures]. One of the ways out is neuromorphic computing,” he said. Neuromorphic architectures “aren’t doing numerical calculations but generic pattern recognition and discrimination processing just as the brain does.”

Three of the more prominent neuromorphic systems in operation today are:

IBM’s TrueNorth uses the TrueNorth chip implemented in CMOS. Since memory, computation, and communication are handled in each of the 4096 neurosynaptic cores, TrueNorth circumvents the von-Neumann-architecture bottlenecks and is very energy-efficient, consuming 70 milliwatts, about 1/10,000th the power density of conventional microprocessors. This spring IBM announced a collaboration with Lawrence Livermore National Laboratory in which it will provide a scalable TrueNorth platform expected to process the equivalent of 16 million neurons and 4 billion synapses and consume the energy equivalent of a hearing aid battery – a mere 2.5 watts of power.

The SpiNNaker project, run by Steve Furber, one of the inventors of the ARM architecture and a researcher at the University of Manchester, has roughly 500K arm processors. It’s a digital processor approach. The reason for selecting ARM, said Meier, is that ARM cores are cheap, at least if you make them very simple (integer operation). The challenge is to overcome scaling required. “Steve implemented a router on each of his chips, which is able to very efficiently communicate, action potentials called spikes, between individual arm processors,” said Meier. SpiNNaker’s bidirectional links to between chips is a distinguishing feature – think of it as a mini Internet, optimized to transmit ‘biological signal spikes, said Meier. The SpiNNaker architecture acts well as a real time simulator.

The BrainScaleS machine effort, led by Meier, “makes physical models of cell neurons and synapses. Of course we are not using a biological substrate. We use CMOS. Technically it’s a mixed CMOS signal approach. In reality is it pretty much how the real brain operates. The big thing is you can automatically scale this by adding synapses. When it is running you can change the parameters,” he said. It’s a local analogue computing approach with four million neurons and one billion synapses – binary, asynchronous communication.

Training the networks – basically programming an application – remains a challenge for all the machines. In essence, the problem and solutions become baked into the structure of the network with training. In the brain this occurs via synaptic plasticity in which the connections (synapses) between neurons are strengthened or weakened based on experience. Neuromorphic computing emulates learning and synaptic plasticity through a variety of techniques.

In his ISC presentation, Meier walked through three examples of neural networks and their training: deterministic supervised; deterministic unsupervised; and stochastic supervised.

“Most of the neural networks in use today are deterministic. You have an input and output pattern, and they are linked by the network – [in other words] if you repeat the experiment you will always get the same result. Of course the configuration of a network has to happen through learning,” said Meier. The supervision involves telling the computer, during learning, whether it has made a right or wrong choice.

You can also have stochastic networks: “You say that reality is a distributions of patterns. What you do in your networks is store a stochastic distribution of patterns, which reflect your prior knowledge, and which is acquired through learning. You can use those stored patterns either to generate distributions without any input or you can do inference,” he explained.

Deterministic Supervised LearningTaking an example from nature, Meier showed an instance of deterministic supervised learning in which a neural network mimics an insect’s natural ability to uses its chemical sensors to distinguish between different flowers. “These are circuits that we reverse engineered using neuroscience,” he explained. “We have receptor neurons that respond to certain chemical substances and you have a layer, called a de-correlation layer, which is basically contrast enhancement. You see that in all perceptive systems in biology. On the right side you see association layer to take combined inputs and make a decision based on what kind of flower you have,” said Meier.

It’s basically a data classification exercise. “The trick is to configure the links between those the de-correlation and association layers and this is done by supervised learning (telling the machine if it is correct or incorrect), through things like back propagation, Monte Carlo techniques, you really have to configure the synaptic link. Does it work? Actually it works very nicely,” said Meier.

As a general rule, he said, spiking activity is high at input layers but then drops markedly. “We see in the intermediate layer that connects association with the input layer, there is also spiking activity but it’s rare, it’s sparse. That’s a very important thing and may be one of the reasons nature has invented spikes because it saves energy. Where interesting computation is being done, the firing rate is sparse.”

Deterministic Unsupervised LearningAs an example of unsupervised deterministic learning, Meier reviewed how owls find prey in the dark. In biology the model is very straightforward. “Since the mouse is on the right side, it is a short flight path for the sound to right ear and a long path to the left ear. [Detecting this is] is done by a circuit encoding compensating for the short path in air by a long path in the brain. You depict the time coincidence between two input impulses to produce a stronger signal and that is done in a completely unsupervised way,” said Meier. This neural net is fairly straightforward to implement in hardware.

Stochastic Supervised LearningDo you see a duck or a rabbit in the image shown here? It could be a duck or a rabbit but you only see one at a given moment.

“You have this stored distribution of probabilities in your brain and you take samples and jump between two options. This can be implemented with Boltzmann machines, in particular, with spiking Boltzmann machines. One of the machines has a network of symmetrically connected stochastic nodes where the stage of the nodes is described by a vector of binary random variables.”

If you are wondering how a spiking vector be a variable, Meier said, “We have developed a theory [in which] zeros and one are presented by neurons that is either active or in a refractory state. The probability that this network converges to a target is a Boltzmann distribution,” said Meier.

“Here of course it is a neural network where we are connecting weights between neurons. How do you train these things? There is a very well established mechanism where you clamp the visible unit of the input layer to the value of a particular pattern and then you weight the interaction between any two nodes. This is the learning process but it’s slow,” said Meier.

While a great deal of progress in neuromorphic computing has been accomplished, thorny issues remain. For example, it’s still not clear what the base technology should be – CMOS chips, wafer scale lithographic ‘emulations’ of neurons, etc. – and more options are on the horizon. IBM recently published a paper around phase-change memristor technology that shows promise.

The wafer scale integration used for BrainScaleS is probably the most novel and brain-like approach so far. During the post-presentation discussion, questions arose around process and device variability and degradation issues for various technologies.

Meier noted, “There is no degradation in the sense of aging. CMOS systems stay as bad or as good as they are. Nano devices still show some endurance problems. The big challenge for the BrainScaleS system is the static variability arising from the CMOS production process. This is like “fixed pattern noise“ on a CCD sensor. You can calibrate it but for really large systems we have to learn how to implement homeostatic“ adaptation like in biology. Our new digital learning center concept will just be doing that.” (http://www.kip.uni-heidelberg.de/vision/research/dls/)

About memristor technology Meier said, “really cool devices, but people have so far totally ignored the aspect of variability. It’s much, much bigger than CMOS. I don’t see how you can calibrate a memristor in a large circuit. [With CMOS] you can calibrate the synapse on a neuron because there are parameters, and SRAM to store the parameters. You can measure and if it doesn’t work too well and I can fix it by going in and calibrating it. How do you do that with memristors?”

Challenges aside, the IBM-LLNL project and two European systems should help accelerate neuromorphic development. Meier notes that access to SpiNNaker and BrainScaleS is not restricted to Europeans although restrictions for some countries exist due to national technology export law.

“There are plenty of users for the small prototype systems. The SpiNNaker boards are in particular attractive because they can be used by anyone trained in standard software tools. There are more than 100 users. The small scale BrainScaleS system has attracted about 10 users but the use is very different from normal computers so people struggle more. The numbers of external users of the HBP collaboratory (including all platforms) is shown in the attached plot. About 10% of them are using the large scale NM systems,” said Meier. Here are a handful of access points:

For information on SpiNNaker contact Furber at the University of Manchester, U.K.

Software tools to configure and control the systems are described in the Neuromorphic Guidebook.

“Clearly the machines we are building at the moment are research devices but they have one important feature, they are really extremely configurable. In particular you can also read out the activity of network because you want to understand what is going on,” said Meier. This will change when putting neuromorphic systems into real-world practice.

“Our idea on long term development is to give up on configurability and to give up on monitoring because if you have a neuromorphic chip, for example, that has to detect certain patterns you don’t really want to look as a user on your cell phone, you don’t want to read out any membrane potential or look at all the spike transistors, and look at all the correlations. You just want the thing to work.”

Large-scale systems, similar to those described here, are more likely to be used “to develop circuits that are interesting, that solve interesting problems. Then you export it and you make special dedicated chips optimized to solve this single problem without any configurability or monitoring capability.”

Think neuromorphic FPGAs, said Meier, “You want a dedicated things that can be mass produced and does one thing very well. That’s the way I see it evolving.”

One issue is implementaing local learning (on chip) when the device is in the field, enabling the system to adapt to changing environments. It would open up a new range of applications, agreed Meier, “There’s a study under way now looking at how a car engine changes all the time, its performance changes, and you always want to optimize the efficiency of the engine. It would be really nice to have these local learning capabilities on chip on the system in the field being applied.” There are no technical barriers in theory he said, but still missing technology components.

Nearer term, Meier is optimistic that accelerating learning will occur and that it represents the needed enabler for broader use of neuromorphic computing and for shortening time needed for training. Spike-based systems will play a role here, he believes – “there is very good argument to say the spikes are not only contributing to energy efficiency but also for learning speed. Once you accelerate learning it will be breakthrough for this technology and may change the way you compute fundamentally.”

]]>What will computing look like in the post Moore’s Law era? That’s probably a bad way to pose the question and certainly there’s no shortage of ideas. A new federal white paper – A Federal Vision for Future Computing: A Nanotechnology-Inspired Grand Challenge – tackles the ‘what’s next’ question and spells out seven specific research and development priorities and identifies the federal entities responsible.

The document, roughly a year in the making, is from the National Nanotechnology Initiative (NNI). The NNI, you may know, has it roots in discussions arising in the late 90s and formal creation by the 21st Century Nanotechnology Research and Development Act in 2003. NNI encompasses a large number of activities has a $1.4B budget request for FY2017.

Intended from the start to be a long-term program with long-term R&D horizons, NNI released of the new vision paper on the first year anniversary of the National Strategic Computing Initiative (NSCI) – perhaps as encouragement to the NSCI community. Specifically, the vision paper supports Nanotechnology-Inspired Grand Challenge, announced last fall by the Obama Administration, to develop “transformational computing capabilities by combining innovations in multiple scientific disciplines.”

As described in the latest paper, “The Grand Challenge addresses three Administration priorities—the National Nanotechnology Initiative (NNI); the National Strategic Computing Initiative (NSCI); and the Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative to: create a new type of computer that can proactively interpret and learn from data, solve unfamiliar problems using what it has learned, and operate with the energy efficiency of the human brain.

Somewhat soberly, the report says, “While it continues to be a national priority to advance conventional digital computing—which has been the engine of the information technology revolution—current technology falls far short of the human brain in terms of the brain’s sensing and problem-solving abilities and its low power consumption. Many experts predict that fundamental physical limitations will prevent transistor technology from ever matching these characteristics.”

NNI has categorized research and development needed to achieve the Grand Challenge into seven focus areas:

Materials

Devices and Interconnects

Computing Architectures

Brain-Inspired Approaches

Fabrication/Manufacturing

Software, Modeling, and Simulation

Applications

Nanotechnology, of course, is already an area of vigorous R&D. As the list of focus areas illustrates, the program covers a wide swath of technologies. Though brief, much of the directional discussion is fascinating. Here’s an excerpt from the materials section:

“The scaling limits of electron-based devices such as transistors are known to be on the order of 5 nm due to quantum-mechanical tunneling. Smaller devices can be made if information-bearing particles with mass greater than the mass of an electron are used. Therefore, new principles for logic and memory devices, scalable to ~1 nm, could be based on “moving atoms” instead of “moving electrons;” for example, by using nanoionic structures. Examples of solid-state nanoionic devices include memory (ReRAM) and logic (atomic/ionic switches).”

Despite the diversity of topics covered, the goal of emulating human brain-like capabilities runs throughout document. Indeed brain-inspired computing R&D is hot right now and making substantial progress.

IBM TrueNorth Platform

In late spring of this year, IBM and Lawrence Livermore National Laboratory announced a collaboration in which LLNL would receive a 16-chip TrueNorth system representing a total of 16 million neurons and 4 billion synapses. At almost the same time in Europe, two large-scale neuromorphic computers, SpiNNaker and BrainScaleS, were put into service and made available to the wider research community.

LLNL will also receive an end-to-end ecosystem to create and program energy-efficient machines that mimic the brain’s abilities for perception, action and cognition. The ecosystem consists of a simulator; a programming language; an integrated programming environment; a library of algorithms as well as applications; firmware; tools for composing neural networks for deep learning; a teaching curriculum; and cloud enablement.

“Lawrence Livermore computer scientists will collaborate with IBM Research, partners across the Department of Energy complex and universities (link is external) to expand the frontiers of neurosynaptic architecture, system design, algorithms and software ecosystem,” according to a project description on the LLNL web site.

The SpinNNaker project, run by Steve Furber, one of the inventors of the ARM architecture and a researcher at the University of Manchester, has roughly 500K arm processors. The reason for selecting ARM, said Meier, is that ARM cores are cheap, at least if you make them very simple (integer operation). The challenge is to overcome scaling required. “Steve implemented a router on each of his chips, which is able to very efficiently communicate, action potentials called spikes, between individual arm processors,” said Karlheinz Meier, a leader in the HBP project whose group developed the BrainScaleS machine.

The BrainScaleS effort, led by Meier, “makes physical models of cell neurons and synapses. Of course we are not using a biological substrate. We use CMOS. Technically it’s a mixed CMOs signal approach. In reality is it pretty much how the real brain operates. The big thing is you can automatically scale this by adding synapses. When it is running you can change the parameters,” Meier said.

It will be interesting to track neuromorphic computing’s advance and observe how effective various government programs are (or are not) moving forward.

Besides including discussion of technical challenges and promising approaches for each of the seven focus areas, the white paper lays out 5-, 10-, and 15-goals for each focus. Here’s a partial excerpt from the brain-inspired computing section:

“High-performance computing (HPC) has traditionally been associated with floating point computations and primarily originated from needs in scientific computing, business, and national security. On the other hand, brain-inspired approaches, while at least as old as modern computing, have traditionally aimed at what might be called pattern recognition applications (e.g., recognition/understanding of speech, images, text, human languages, etc., for which the alternative term, knowledge extraction, is preferred in some circles) and have exploited a different set of tools and techniques.

“Recently, convergence of these two computing paths has been mandated by the National Strategic Computing Initiative Strategic Plan, which places due emphasis on brain-inspired computing and pattern recognition or knowledge extraction type applications for enabling inference, prediction, and decision support for big data applications. DOE and NSF have demonstrated significant scientific advancements by investing and supporting HPC resources for open scientific applications. However, it is becoming apparent that brain-like computing capabilities may be necessary to enable scientific advancement, economic growth, and national security applications.

10-year goal: Identify and reverse engineer biological or neuro-inspired computing architectures, and translate results into models and systems that can be prototyped.

15-year goal: Enable large-scale design, development, and simulation tools and environments able to run at exascale computing performance levels or beyond. The results should enable development, testing, and verification of applications, and be able to output designs that can be prototyped in hardware.”

The new document is a fairly quick read and has a fair amount of technical detail. Here’s a link to the white paper: http://www.nano.gov/node/1635

]]>https://www.hpcwire.com/2016/08/08/nanotech-grand-challenge-federal-grand-vision-future-computing/feed/029192IBM Phase Change Device Shows Promise for Emerging AI Appshttps://www.hpcwire.com/2016/08/03/ibm-phase-change-device-shows-promise-emerging-ai-apps/?utm_source=rss&utm_medium=rss&utm_campaign=ibm-phase-change-device-shows-promise-emerging-ai-apps
https://www.hpcwire.com/2016/08/03/ibm-phase-change-device-shows-promise-emerging-ai-apps/#respondWed, 03 Aug 2016 19:28:04 +0000https://www.hpcwire.com/?p=29133IBM Research today announced a significant advance in phase-change memristive technology – based on chalcogenide-based phase-change materials – that IBM says has the potential to achieve much higher neuronal circuit densities and lower power in neuromorphic computing. Also, for the first time, researchers demonstrated randomly spiking neurons based on the technology and were able to […]

]]>IBM Research today announced a significant advance in phase-change memristive technology – based on chalcogenide-based phase-change materials – that IBM says has the potential to achieve much higher neuronal circuit densities and lower power in neuromorphic computing. Also, for the first time, researchers demonstrated randomly spiking neurons based on the technology and were able to detect “temporal correlations in parallel data streams and in sub-Nyquist representation of high-bandwidth signals.”

The announcement coincided with the publishing of a paper, Stochastic phase-changing neurons, in Nature Nanotechnology. The work, largely performed at IBM Research Zurich, is part of a sizable body of exciting research currently occurring in the neuromorphic space.

Turning on two large-scale neuromorphic computers – SpiNNaker and BrainScaleS – this spring and making them readily available to researchers marked another major step forward in pursuit of brain-inspired computing. These machines, based on different neuromorphic architectures, mimic much more closely the way brains process information. Why this is important is perhaps best illustrated by an earlier effort to map neuron-like processing onto a traditional supercomputer, in this case Japan’s K Computer.

“They used about 65000 processors [to create] a billion very simple neurons. Call it one percent of the brain – but it’s not really a brain, [just] a very simple network model. This machine consumes the power of 30MW and the system runs 1500X slower than real-time biology,” said Karlheinz Meier, a leader in the Human Brain Project and whose group developed the BrainScaleS machine. In terms of energy efficiency, this K Computer effort was “ten billion times less energy efficient.”

Evangelos Eleftheriou, IBM Fellow

The IBM research takes substantial steps forward on several fronts. “We have been researching phase-change materials for memory applications for over a decade, and our progress in the past 24 months has been remarkable,” said IBM Fellow Evangelos Eleftheriou. “In this period, we have discovered and published new memory techniques, including projected memory, stored 3 bits per cell in phase-change memory for the first time, and now are demonstrating the powerful capabilities of phase-change-based artificial neurons, which can perform various computational primitives such as data-correlation detection and unsupervised learning at high speeds using very little energy.”

IBM scientists organized hundreds of artificial neurons into populations and used them to represent fast and complex signals. Moreover, the artificial neurons have been shown to sustain billions of switching cycles, which would correspond to multiple years of operation at an update frequency of 100 Hz. The energy required for each neuron update was less than five picojoules and the average power less than 120 microwatts — for comparison, 60 million microwatts power a 60 watt lightbulb.

As Eleftheriou notes memristor technology is hardly new, and as promising as it is, there have also been obstacles, notably variability and scaling, but the IBM approach overcomes some of those issues.

Co-author Dr. Abu Sebastian, IBM Research, told HPCwire, “Variability is indeed a big issue for resistive memory technologies such as the ones based on metal-oxides (TiOx, HfOx etc.). But not so much in phase change memory (PCM) devices. We still have a certain amount of inter-device and intra-device variability even in PCM devices. Clearly this type of randomness would be undesirable for memory-type of applications. However, for neuromorphic applications, this could even be advantageous as we show in our paper. The stochastic firing response of the phase change neurons and their ability to represent high frequency signals arise from this randomness or variability.”

IBM Phase Change Device

Just the mechanism of IBM’s phase change device technology is fascinating. The artificial neurons designed by IBM scientists in Zurich consist of phase-change materials, including germanium antimony telluride, which exhibit two stable states, an amorphous one (without a clearly defined structure) and a crystalline one (with structure). These materials are the basis of re-writable Blue-ray discs. However, the artificial neurons do not store digital information; they are analog, just like the synapses and neurons in our biological brain.

In the published demonstration, the team applied a series of electrical pulses to the artificial neurons, which resulted in the progressive crystallization of the phase-change material, ultimately causing the neuron to fire. In neuroscience, this function is known as the integrate-and-fire property of biological neurons. This is the foundation for event-based computation and, in principle, is similar to how our brain triggers a response when we touch something hot.

The authors write in the conclusion:

“The ability to represent the membrane potential in artificial spiking neurons renders phase-change devices a promising technology for extremely dense memristive neuro-synaptic arrays. A particularly interesting property is their scalability down to the nanometer scale and the fast and well-understood dynamics of the amorphous-to-crystalline transition. The high speed and low energy at which phase-change neurons operate will be particularly useful in emerging applications such as processing of event-based sensory information, low-power perceptual decision making and probabilistic inference in uncertain conditions. We also envisage that distributed analysis of rapidly emerging, pervasive data sources such as social media data and the ‘Internet of Things’ could benefit from low-power, memristive computational primitives.”

A figure from the paper, shown below, illustrates the principal.

Artificial neuron based on a phase-change device, with an array of plastic synapses at its input. Schematic of an artificial neuron that consists of the input (dendrites), the soma (which comprises the neuronal membrane and the spike event generation mechanism) and the output (axon). The dendrites may be connected to plastic synapses interfacing the neuron with other neurons in a network. The key computational element is the neuronal membrane, which stores the membrane potential in the phase configuration of a nanoscale phase-change device. Owing to their inherent nanosecond-timescale dynamics, nanometre-length-scale dimensions and native stochasticity, these devices enable the emulation of large and dense populations of neurons for bioinspired signal representation and computation.

“By exploiting the physics of reversible amorphous-to-crystal phase transitions, we show that the temporal integration of postsynaptic potentials can be achieved on a nanosecond timescale. Moreover, we show that this is inherently stochastic because of the melt-quench-induced reconfiguration of the atomic structure occurring when the neuron is reset,” write the researchers in the paper.

Mimicking threshold-based neuronal models with traditional technology can be problematic, note the researchers. “Emulating these by means of conventional CMOS circuits, such as current-mode, voltage-mode and subthreshold transistor circuits, is relatively complex and hinders seamless integration with highly dense synaptic arrays. Moreover, conventional CMOS solutions rely on storing the membrane potential in a capacitor. Even with a drastic scaling of the technology node, realizing the capacitance densities measured in biological neuronal membranes is challenging.”

(A brief note on the phase-change device attributes is included at the end of this article.)

One interesting aspect of the work is demonstration of stochastic behavior of artificial neurons based on the phase-change technology. The authors believe they can turn this into an advantage. The stochastic behavior of neurons in nature results from many sources such as the ionic conductance noise, the chaotic motion of charge carriers due to thermal noise, inter-neuron morphologic variabilities and background noise. This complexity has “restricted the implementation of artificial noisy integrate-and-fire neurons in software simulations”, despite their importance in bio-inspired computation and applications in signal and information processing.

The phase change devices, it turns out, also exhibit stochastic behavior for similar reasons. Think of a population of 1000 artificial neurons based on IBM’s phase-change technology. Broadly, when fed the same signal trains, they will fire at slightly differing rates that track well with standard probability models. This can be extremely useful when using a population of artificial neurons for some tasks.

“The relatively complex computational tasks, such as Bayesian inference, that stochastic neuronal populations can perform with collocated processing and storage render them attractive as a possible alternative to von-Neumann-based algorithms in future cognitive computers.”

Here’s a link to a Youtube video by IBM on the technology posted today.

Note on Phase-Change Device Attributes Excerpted From the Paper

The mushroom-type phase-change devices used in the experiments were fabricated in the 90 nm technology node with the bottom electrode created using a sublithographic key-hole process27. The phase-change material was doped Ge2Sb2Te5. All experiments were conducted on phase-change devices that had been heavily cycled. The bottom electrode had a radius of 20 nm and a length of 65 nm. The phase-change material was 100 nm thick and extended to the top electrode, the radius of which was 100 nm. For the single- neuron experiments, the phase-change device was operated in series with a resistor of 5 kΩ. The experiments using multiple neurons and experiments with neuronal populations were based on a crossbar topology in which 100 phase-change devices were interconnected in a 10 × 10 array unit, with a lateral field-effect transistor used as the access device. We used multiple array units to reach population sizes of up to 500 neurons.

]]>https://www.hpcwire.com/2016/08/03/ibm-phase-change-device-shows-promise-emerging-ai-apps/feed/029133Beyond von Neumann, Neuromorphic Computing Steadily Advanceshttps://www.hpcwire.com/2016/03/21/lacking-breakthrough-neuromorphic-computing-steadily-advance/?utm_source=rss&utm_medium=rss&utm_campaign=lacking-breakthrough-neuromorphic-computing-steadily-advance
https://www.hpcwire.com/2016/03/21/lacking-breakthrough-neuromorphic-computing-steadily-advance/#respondMon, 21 Mar 2016 13:00:20 +0000http://www.hpcwire.com/?p=25755Neuromorphic computing – brain inspired computing – has long been a tantalizing goal. The human brain does with around 20 watts what supercomputers do with megawatts. And power consumption isn’t the only difference. Fundamentally, brains ‘think differently’ than the von Neumann architecture-based computers. While neuromorphic computing progress has been intriguing, it has still not proven very practical.

]]>Neuromorphic computing – brain inspired computing – has long been a tantalizing goal. The human brain does with around 20 watts what supercomputers do with megawatts. And power consumption isn’t the only difference. Fundamentally, brains ‘think differently’ than the von Neumann architecture-based computers. While neuromorphic computing progress has been intriguing, it has still not proven very practical.

This week neuromorphic computing takes another step forward with a workshop being offered to users from academia, industry and education interested in using two European neuromorphic systems that have been years in development and are coming online for broader use – the BrainScaleS system launching at the Kirchhoff Institute for Physics of Heidelberg University and SpiNNaker, a complementary approach and similarly sized system at the University of Manchester.

Ramping up BrainScaleS and SpiNNaker is an important milestone, strengthening Europe’s position in hardware development for alternative computing. Both projects are part of the European Human Brain Project, originally funded by the European Commission’s Future Emerging Technologies program (2005-2015). The webcast, which will be streamed live on Tuesday, will cover the architecture for both systems and approaches to application development.

BrainScaleS and SpiNNaker take different tacks for modeling neuron activity. One approach is to use traditional analog circuits — like the chips being developed by the BrainScaleS. Analog circuits can be fast and energy and efficient. Conversely, SpiNNaker’s architecture closely links a very large number of digital cores (also fast, and in this case, also energy efficient).

BrainScaleS’s neuromorphic hardware is based around wafer-scale analog, very large scale integration (VLSI). Each 20-cm-diameter silicon wafer contains 384 chips, each of which implements 128,000 synapses and up to 512 spiking neurons[i]. This gives a total of around 200,000 neurons and 49 million synapses per wafer. These VLSI models operate considerably faster than the biological originals and allow the emulated neural networks to evolve tens-of-thousands times quicker than real time. Put another way, a biological day of learning can be compressed to 100 seconds on the machine.

Leader of the BrainScaleS project, Prof. Dr. Karlheinz Meier (Heidelberg University) explains, “The BrainScaleS system goes beyond the paradigms of a Turing machine and the von Neumann architecture. It is neither executing a sequence of instructions nor is it constructed as a system of physically separated computing and memory units. It is rather a direct, silicon based image of the neuronal networks found in nature, realizing cells, connections and inter-cell communications by means of modern analogue and digital microelectronics.”

Learning – not external programming – is a key guiding principle. Unlike traditional computer architecture in which a structured program explicitly carries out an order of tasks, brains are fundamentally learning machines that turn patterns into programs.

Steve Furber, a professor at the University of Manchester and a co-designer of the ARM chip architecture, leads the SpiNNaker team. SpiNNaker is a contrived acronym derived from Spiking Neural Network Architecture. The machine consists of 57,600 identical 18-core processors, giving it 1,036,800 ARM968 cores in total. The die is fabricated by United Microelectronics Corporation (UMC) on a 130 nm CMOS process. Each System-in-Package (SiP) node has an on-board router to form links with its neighbors, as well as 128 Mbyte off-die SDRAM to hold synaptic weights.

SpiNNaker, too, is built to mimic the brain’s biological structure and behavior. It will exhibit massive parallelism and resilience to failure of individual components. With more than one million cores, and one thousand simulated neurons per core, SpinNNaker should be capable of simulating one billion neurons in real-time. This equates to a little over one percent of the human brain’s estimated 85 billion neurons.

Rather than implement one particular algorithm, SpiNNaker will be a platform on which different algorithms can be tested. Various types of neural networks can be designed and run on the machine, thus simulating different kinds of neurons and connectivity patterns.

Both BrainScaleS and SpiNNaker architectures will be discussed during the Web-based workshop on March 22, scheduled from 3 pm to 6 pm CET. Together, the systems located in Heidelberg and Manchester comprise the “Neuromorphic Computing Platform” of the Human Brain Project.

Much of the early work on both machines will be basic research on self-organization in neural networks. Other potential applications, for example, are in energy and time efficiency optimization, broadly similar to deep learning technology developed by companies like Google and Facebook for the analysis of large data volumes using conventional high performance computers.

IBM’s Dharmendra Modha

Europe, of course, is hardly alone in pursuing neuromorphic computing. Most prominent in the U.S. is IBM Research’s TrueNorth Chip effort. Dharmendra Modha, IBM fellow and chief scientist for brain-inspired computing, wrote an interesting commentary on the TrueNorth project that traces development of von Neumann architecture based computing and contrasts it with neuromorphic computing approaches: Introducing a Brain-inspired Computer. Though written in 2014, it remains relevant.

TrueNorth chip, introduced in August 2014, is a neuromorphicCMOSchip that consists of 4,096 hardware cores, each one simulating 256 programmable silicon “neurons” for a total of just over a million neurons. Each neuron has 256 programmable “synapses” which convey the signals between them. Hence, the total number of programmable synapses is just over 268 million (228). In terms of basic building blocks, its transistor count is 5.4 billion.

Developed under the DARPA SyNAPSE (Systems of Neuromorphic Adaptive Plastic Scalable Electronics) project, TrueNorth’s computing power has been characterized as roughly equivalent to the brainpower of a rodent. It also circumvents the von-Neumann-architecture bottlenecks, is very energy-efficient, consumes merely 70 milliwatts, and is capable of 46 billion synaptic operations per second, per watt – literally a synaptic supercomputer in your palm.

BrainScaleS, SpiNNaker, and TrueNorth are just three examples of many ongoing neuromorphic computing projects. Turning them into commercial products or more general purpose computing machines remains a challenge.

Indeed, IBM put together a paper on cognitive computing commercialization and barriers[ii]. “New thinking, not only on the part of programmers and application developers, but also by organizational decision makers who seek to link technological possibilities to market opportunity. While incremental innovation can be achieved on the basis of existing knowledge in well-charted commercial territory, radical innovation entails far greater uncertainty.”

Among the barriers cited were: formulating business models and predicting future revenue to calibrate investment, defining strategy and structure to execute and finally, overcoming communicative and functional boundaries.

Much of the drive to push neuromorphic computing stems from the ongoing decline of Moore’s law, and this excerpt from a 2014 ACM article[iii] still sums circumstances today:

As the long-predicted end of Moore’s Law seems ever more imminent, researchers around the globe are seriously evaluating a profoundly different approach to large-scale computing inspired by biological principles. In the traditional von Neumann architecture, a powerful logic core (or several in parallel) operates sequentially on data fetched from memory. In contrast, “neuromorphic” computing distributes both computation and memory among an enormous number of relatively primitive “neurons,” each communicating with hundreds or thousands of other neurons through “synapses.” Ongoing projects are exploring this architecture at a vastly larger scale than ever before, rivaling mammalian nervous systems, and developing programming environments that take advantage of them. Still, the detailed implementation, such as the use of analog circuits, differs between the projects, and it may be several years before their relative merits can be assessed.

Researchers have long recognized the extraordinary energy stinginess of biological computing, most clearly in a visionary 1990 paper by the California Institute of Technology (Caltech)’s Carver Mead that established the term “neuromorphic.” Yet industry’s steady success in scaling traditional technology kept the pressure off.”

[i] “Spiking neural networks (SNNs) fall into the third generation of neural network models, increasing the level of realism in a neural simulation. In addition to neuronal and synaptic state, SNNs also incorporate the concept of time into their operating model. The idea is that neurons in the SNN do not fire at each propagation cycle (as it happens with typical multi-layer perceptron networks), but rather fire only when a membrane potential – an intrinsic quality of the neuron related to its membrane electrical charge – reaches a specific value. When a neuron fires, it generates a signal which travels to other neurons which, in turn, increase or decrease their potentials in accordance with this signal. In the context of spiking neural networks, the current activation level (modeled as some differential equation) is normally considered to be the neuron’s state, with incoming spikes pushing this value higher, and then either firing or decaying over time. Various coding methods exist for interpreting the outgoing spike train as a real-value number, either relying on the frequency of spikes, or the timing between spikes, to encode information.” From https://en.wikipedia.org/wiki/Spiking_neural_network.

]]>https://www.hpcwire.com/2016/03/21/lacking-breakthrough-neuromorphic-computing-steadily-advance/feed/025755Carver Mead on Quantum Computing and Neuromorphic Designhttps://www.hpcwire.com/2013/11/25/carver-mead-quantum-computing-neuromorphic-design/?utm_source=rss&utm_medium=rss&utm_campaign=carver-mead-quantum-computing-neuromorphic-design
https://www.hpcwire.com/2013/11/25/carver-mead-quantum-computing-neuromorphic-design/#respondMon, 25 Nov 2013 22:04:05 +0000http://www.hpcwire.com/?p=2056Computer scientist, inventor and university physicist Carver Mead is perhaps best known for coining the phrase “Moore’s law,” helping to popularize Gordon Moore’s 1965 observation that the number of transistors on an integrated circuit doubles about every 24 months. Mead was also instrumental in the prediction’s tremendous staying power. One of Mead’s most significant contributions […]

]]>Computer scientist, inventor and university physicist Carver Mead is perhaps best known for coining the phrase “Moore’s law,” helping to popularize Gordon Moore’s 1965 observation that the number of transistors on an integrated circuit doubles about every 24 months. Mead was also instrumental in the prediction’s tremendous staying power.

One of Mead’s most significant contributions to computing was a technique called very large-scale integration (VLSI), which enabled tens of thousands of transistors to be fitted onto a single silicon chip. In 1979, Mead taught the world’s first VLSI design course and created the first software compilation of a silicon chip. His 1980 textbook “Introduction to VLSI Design,” coauthored by Lynn Conway, launched the Mead and Conway Revolution. Mead and his contemporaries set the stage for the “microchip revolution” in the Pacific Northwest. His methods of complex chip design have catalyzed decades of progress.

In the 1980s, Mead grew frustrated with the limits of traditional CPU design, and turned to mammalian brains for inspiration. Three decades hence, this field of neuromorphic computing is back in the spotlight with efforts like the Human Brain Project. Mead, now 79, maintains a professor emeritus position at Caltech, where he taught for over forty years. In a recent interview with MIT Technology Review, Mead details why it’s important for computer engineers to explore new forms of computing.

In Mead’s view, one of the thorniest challenges for the chip industry is power dissipation. For decades now, the focus has been on faster and faster chips, but the heat issue can’t be ignored. Mead notes that “It’s a common theme in technology evolution that what makes a group or company or field successful becomes an impediment to the next generation. … Everyone was richly rewarded for making things run faster and faster with lots of power. Going to multicore chips helped, but now we’re up to eight cores and it doesn’t look like we can go much further. People have to crash into the wall before they pay attention.”

These limitations are what prompted his interest in neuromorphic designs. “I was thinking about how you would make massively parallel systems, and the only examples we had were in the brains of animals,” he tells MIT Technology Review, “We built lots of systems. We did retinas, cochleas—a lot of things worked. A lot of my students are still working on this. But it’s a much bigger task than I had thought going in.”

Mead is also directing his energy into developing a unified framework to explain both electromagnetic and quantum systems. This is summarized in his book Collective Electrodynamics. Mead is skeptical, yet supportive, of current quantum computing projects.

“We don’t know what a new electronic device is going to be. But there’s very little quantum about transistors,” he says. “I’m not close to it, but I’m generally supportive of these people doing what they call quantum computing. People have got into trying to build real things based on quantum coupling, and any time people try to build stuff that actually works, they’re going to learn a hell of a lot. That’s where new science really comes from.”

Mead’s viewpoint is refreshing and inspirational. He reminds us that all new technologies start small before becoming “part of the infrastructure that we take for granted.” Even “the transistor was [once] a tiny little wart off a big industry,” he quips.