NICE Conference 2018, Hillsboro Oregon

The sixth annual Neuro Inspired Computational Elements (NICE) was hosted by Intel at the Jones Farm campus in Hillsboro, Oregon, February 27 to March 1. It featured talks on Intel’s new neuromorphic chip arrival “Loihi” as well upgrades to the Human Brain Project sponsored BrainScaleS-2 and SpiNNaker systems and a slew of talks from regulars and a few newcomers.

Fred Rothganger

Sandia National Laboratories

Fred Rothganger, Sandia National Laboratory

A neural inspired software stackGeneric OS Stack
-source file
-compiler
-loader
-operating systemBUT: neuro inspired devices are not general-purpose computers.Neuromorphic “Device Drivers”What is common across most neuromorphic architectures?
-leaky integrate and fire
-spike messages, with configurable weight and delayUnits may remain on the CPU because the are incompatible or because the device is fullSoftware stack automatically “migrates” compatible units onto STPU, up to its capacityA language for neural computingThree visions
1) Tensor Flow type networks
2) Spiking networks
3) Neuroscience ModelsWhere will novel algorithms come from in the future? (he thinks neuroscience)What language will best express these algorithms?
-something more general than either tensors of LIFs

Simon Knowles

GraphCore

Simon Knowles, GraphCore

Designing Processors for the Nascency of Machine IntelligenceSmall British Company (75)“Machine Intelligence”–nothing artificial about intelligenceNations which do not master it will fall far behindprogramming & software & hardware will all change….but how?A canonical intelligent agent is a bayesian probability sequence to sequence translatorThe search for intelligence also needs something like evolution to explore model structures.Knowledge models are high dimensional data, naturally represented as graphs
-the same graph is a natural partitioning for distributed inference computationFor machine efficiency, the structure of the graph can remain static during optimization.Some types of structure search can also be cast as static graphsour comprehension and mechanization of intelligence is nascent
-understanding of models and learning algorithms is changing rapidly
-huge compute required for model discovery as well as model optimizationUntil we understand intelligence better, we need machines which…
-exploit massive parallelism
-are agnostic to model structure
-have a simple programming abstraction
-are efficient for both training and deployed inferenceGood computer architecture bets for this new workload
-massive exposed parallelism
-large graphs are necessarily sparse;
-static structure allows compiled communication
-low precision arithmetic
-approximate inference on probabilistic models learned from noisy data
-physical data invariances provide some structural priors
-convolutions, recurrence
-noise generationSilicon scaling is limited by powerThroughput-oriented silicon should be mostly memorySilicon efficiency is the full use of available power
-distributed memory
-recompute what you cant memorize locally
-compile communications
-serialize communication and computerProximity is defined by energy, more than by time.Poor return on processor complexity: Pollacks Rule
-processors performance ~sqrt(#T)Pure Distributed Machine
-static partitions of work and memory
-Threads hide only local latencies (arithmetic, memory, branch)
-Compiled communications patterns over a stateless exchangeDistributed SRAM on chip instead of DRAM on interposerWe need to expose A LOT of parallelism
-much more than parallelizing over one maths kernel at a time“Colossus” IPU
-all new pure distributed multi-processor built specifically for the ongoing discovery and early deployment of MI
-mostly memory “model on chip”
-a cluster of IPU acts like a bigger IPU
-stepwise-compiled deterministic communication under BSP
-programmable using standard frameworks or Poplar native graph abstraction
-TensorFlow,Pytorch,caffe2,mxnetBulk Synchronous Parallel
-Massively parallel programming with no concurrency hazards
-Communication patterns are compiled, but dynamically selected

To make a computer like this you need to be very good at load balancing

Looked at timing of activation of brain regions as a metric for computational time, where later activation is assumed higher-level, etc.

Tried to correlated behavior to fMRI
-count # of the behaviors activation instances per bin
-normalize activations by the total
-approximate with a line
-identify the slope
-The slope is negative for finger tapping (“bottom up”)
-The slope is positive for reasoningSlope ranking of various activities
-There are some activity that appear more from “top down” and some from “bottom up” in terms of brain hierarchies.

Hava Siegelmann

Symbolic Continuity Conclusions
-fundamental principle: brain is structurally organized to produce behavioral abstractionNext Directions
-check flow of cognitive behavior in health and under different situations
-could we find relations among similar behaviors
-…

Kathleen Hamilton

Oak Ridge National Laboratory

Sparse Hardware embedding of spin glass spiking neuron networks for community detectionUnderstanding network dynamics has many real world applications
-spread of epidemics through social networks
-failures on networks become catastrophic events
-how many baby monitors does it take to take down the internetAnalysis of graph structure
-shortest path between two points
-identifying or quantifying important vertices on a graph
-identifying communities in a graphOur approach to using neuromorphic hardware for graph-related problemsAnalyzing graph structure from spiking data
-What characteristics are needed
-what is useful output?Hopfield Neural network
-constructing a network for graph bi-partitioning is straight forward

Day 2

Bob Colwell

Some observations about the near future of alternative computing technology

Bob Colwell NICE 2018

Does not agree with Mayberry–Moorse Law is ending and its obvious
Conventional digital CMOS is already good enough for lots of things
Can’t dislodge the incumbent by barely beating it
-would have to win by very large margin across board
-make something important feasible that was not before
NICE is going to have to learn to play nice with incumbent tech
-accelerators
-add-on facilities for servers
-add-on tech for conventional SoCs
-match market economics

There is a set of economics that is not visible until you try and enter it, then it becomes very clear very quickly

“Our tools for automatically allocating workloads is pretty terrible”

It is easy to design things that cannot be effectively programmed
What can we learn from computing history?
General purpose computing ruled for decade because of golden rule
-better perf on existing code + new apps & OS’s = $$Profits
-CPU perf/features got exponentially & predictably better over time
-CPU perf has stalled, and GPU perf will too, dur to thermals
NICE-accelerators will improve for a few years but Moores Law is still dead
-OS’s/SW got better (and demanded better CPU’s)
-new SW will take advantage of new NICE accelerators BUT
-out ability to program heterogeneous accelerators is poor
-what are the apps? those are what drive demand, not “capability” per se
-Platform improvements did not choke(PCI,QPI,USB,DRAM,buses,caches…)
-still some room for improvement here
-but DRAM is dying, system economics worsening
-overall system cost fell drastically due to huge unit volumes
-smartphone volumes are reaching saturation
-IoT volumes will dwarf smartphone, but there’s no profit there
-security issues have remained annoyances, not limiters
-there was a predictable future safeauarding today’s investmentsGeneral computing largely ignored efficiency
-MPEG-2 HW decoder 1000x > software decoder on CPU
-Tasks that are too much for CPU’s may be feasible with accelerators
-end of Moore’s law means you can no longer just wait around and faster machine will appear….only accelerators will enable certain classes of new apps
-what new apps? dunno. we never knew until they appearsEfficiency became 1st order concern in 2004 when sys thermals hit air-cooling limit
-industry answer: multi-coreSwitch away from historic “design to be correct” to “design knowing there will be emergent behavior” and algorithmic errors/misuse
-emergent behavior is never in your favor
-it arises from system complexity
-the same system complexity that keeps most humans from understanding how the system really works or why it does what it does
There will be unintended “communication paths” in the systems you sell into
-Spectre/Meltdown
-RowHammer
-EMI,RFI,gnd/VCC coupling, intentional mistreatment by hackersThere will be failures, Software and HardwareYou must judiciously provide backup plans
-but don’t make usual error of treating them less seriously than main plan
-thinking you have a backup, when you actually don’t, is worse than having noneGoogles AI thinks this turtle looks like a gun
-systems not designed to algorithmic limitations
-cant figure out why system errs or that it is erring
-And if a given system learns, its even harder to know what it will do.

Classifying an image that looks like noise confidently as a zebra is a big problem

Tragedy of the commons paths
-aka shared resources
-power supply, ground returns, EMI
-Thermals
-Security-related behavior
-manage these despite inevitable design errors
Thermals is the one I worry about most
-each hetero agent uses supply current, generates ground return current, & contributes to overall thermal load
What about “Machine Check”?
-after 40 years we have no standards in the area
The heterogeneous future is inheriting an ad hoc, crazy quilt

If you walk up to a CEO of an established company with a new idea and they do not own it, it will be perceived on the threat axis first.

If you get designed onto Apple’s platform, understand they don’t want you there.

Christoph von der Malsburg

Platonite
Dynamic Link Architecture

Christoph von der Malsburg NICE 2018

The state of Neural Computation is dominated by deep learning on GPU
No need for neuromorphic computing currently
The future: artificial general intelligence
The scope issue
-present systems have very limited scope
-human intelligence encompasses our whole life!
Conceptually, AI has made no progress since…list of big names
There is a Roadblock!
-The Neural Code Issue: how does neural tissue represent mental phenomena
Classical AI: Bits
-completely general
-to be generated/interpreted by algorithms
-intelligence only in programmers mind
-AGI beyond the economic power of the world
Artificial Neural Networks: neurons
-neurons as logical propositions
-each decision needs ta dedicated neuron
-missing concept: compositionality
-lack of expressive power
-deep learning needs too many examples
Conclusion: bits are too general, neurons are too narrow. AI is mere shadow of human thought
The Human Model
-one GB of genetic information suffices to make the brain
-one GB of virtual reality would suffice to train the brain
-Simple nursery
-200 million eye blinks over three years
3 year old children:
-grasp their environment
-navigate, manipulate
-act purposefully
-speak
-learn from single inspection
Embodiment
Situatedness: local in space and time, no global maps, no big data
Conflict with Classical Computing
-programming bottom-up, not top-down
-impossible to know the actual state of the system due to non-deterministic, asynchronous operation
The Brain as a Dynamic System
-neural activity: continuous, stochastic variable, brain state trajectories are attractors, stabilizing forces
Connectivity: connection weights, continuous, stochastic variables
-How come memory is not eroded by noise?
Brain connectivity must be an attractor!
Forces: cooperation and Competition
Network Self-Organization
-1 petabyte of information in connectivity!
Attractor Nets: neural fields and topological mapping
-neural sheets
-shape, pose, texture, illumination, motions, etc
Comprehension by Abstraction
-“Scene”“Schema”
See review slide pic

Garrett Kenyon

How do brains learn about the physical world?

Garrett Kenyon NICE 2018

Took picture of occluded suitcase and gave to google, which labeled it as a “floor” and gave similar images as traffic accidents, streets, etc.
Imagenet LSVRC02017, current best algo is 20% error
Bengio Paper: Measuring the tendency of CNNs to learn surface statistical regularities
https://arxiv.org/abs/1711.11561
What are we missing?
-lateral inhibition ( leads to baysean inference)
Reconstruction example
Stereo Features Examples
Retina Example with gamma waves

Wolfgang Maass

Networks of Spiking Neurons Learn to Learn

Wolfgang Mass NICE 2018

L2L/Metalearning has been discussed for decades
Only recently with sufficient computational power being available, it has become an important tool in ML
Review Standard L2L Framework
-consider a family of learning tasks
-the first art is to define F in such a way, that L2L produces a desired result
-the second art is to define the fitness function
-the third art is to choose the right NN and the right set up HPs
An essential difference to the standard practice of ML: testing is not carried out for a new examples from the same learning task, but for new examples from a new learning task from the same family F.
Choose hyper parameters so that they define all aspects of learning in the neuromorphics device N
Use benchmark challenges that were proposed for L2L applied to non-spiking recurrent ANNs (LSTM networks).
-See slide for implementation details
Ported L2L framework on the HICANN-DLS chip
L2L Benchmarks on RL tasks
-navigation to a goal G in a random mazes of a given size
-random MDP of a given size
Choosing a good optimization algorithm for the outer loop is essential
-Evolution Strategies
-Cross Entropy Method
-Simulated Annealing
Proof of concept: one-arm-bandit task, 11 machines, with last machine giving a coded signal for the highest reward machine in the other 10. Can it discovery the hidden code?
-after 500k trials, learned to efficiently explore family of functions, …?
SNNs an also learn-to-learn from a teacher
See Summary Slide Pic

Threshold modulation – provides context
Synapse Updates are Hebbian
-only potential synapses. based on activity, new ones are made
-stabilize by principles of homeostasis
“conditional Hebb”
-if axon or dendrite has excess connections, do not strengthen
-if axon or dendrite has too few connections, do not weaken
Do not forget – long term memory
-nonlinear Hebb to avoid “catastrophic forgetting”
-reduce permanence decrements for well established synapse
-results in two population of synapses: plastic and permanent

Constantine Michmizo

Brain-Morphism: astrocytes as memory units

Constantine Michmizo NICE2018

Good review of computational neuroscience over the years
dogma that brain equals neurons is highly prevalent
Good reason: neurons are electrically active and we can manipulate them
up to 90% of brain cells are not neurons, they are glia cells
-they are electrically silent
until recently we could not see function of astrocytes
astrocytes form their own communication network, and interacte with neuron synapse networks
Two opportunities:
-Create computation models of astrocytes
-Create neuro-astrocytes networks
How to model?
inter-cellular Ca2+ wave propagation in space and time
How astrocytes control neurons
Tripartite Syanpse
-hears what neuronal component is saying
-modulates the neuronal component by inhibiting pre-synaptic or injecting current into post-synaptic neuron
Could explain some brain (delta) waves?
Goal was to find a learning rule that would generate a transistion between memories (in a hopfield network)
“Learning is expanding beyond weight change”
astrocytes can sense and impose synchronicity in small populations of neurons

“There are no tensors in this system. Tensors are not the right framework for this event-driven spiking system.”

Intel Loihi Systems Outlook

Intel Neuromorphic Research Community
-we wish to engage with collaborators in academic, government, industry research groups
–remote access to Loihi systems, SDK, SW
–Loaned Loihi system and bare chips
-opportunity for limited funding (RFP available late march)

Q: What memory space does the onboard CPU have access to?
A: Almost all node state is available to onboard CPU. 128kB of synaptic memory, bits dividable up however you want

The hard problem of olfaction, and its generality
—>how do you identify the existence of an oodor when its not more or less concentrated than other odors
Odorants interfere with one another by competing for receptors
Odors (sources) cannot be identified efficiently from mixed inputs based on feedforward processing along
hypothesis: the key is prior learning
Homeomorphic/diffeomorphic transformations
categorical learning generators sparse representations of sources, rending the hard problem theoretically tractable
occluded, noisy signals need to be efficiently attracted towards learning source representations
the blessing of dimensionality protects against destructive interference
core principles of implementation
-preprocessing
-generation of orthogonalized representations of complex, diagnostic input patterns via statistical learning
-selective deployment of those orthogonalized represetnation as inhibition onto the afferent input stream
-generate attractors to these learning signal engrams
-interpretable learning
without learning, odor representations are stationary
stronger activation leads to a phase lead with respect with the gamma clock
without learning, noisy signals are also stationary

Chris Eliasmith

Chris Eliasmith NICE 2008

Mike (Intel) approached Chris and asked if there were any applications he could port to chip
Not much time, 2 months, debugging tools non-existent
Implemented some basic demos just in time
Seemingly good results with adaptive motor control

Sebastian Schmitt

Heidelberg University
Experiments on BrainScaleS

Sebastian Schmitt NICE 2008

20 cm wafer, 180nm CMOS
main PCB
48 kintex-7 FPGAs
Power Supplies
Aux Boards
in development for >10 years