현재 위치

Technology

Neuron Machine architecture

The NM is an innovative design paradigm for register-transfer (RTL) level, synchronous hardware systems. The NM system consists of a neuron block (NB) and a network unit (NU) as shown in the following figure.

<Configuration of Neuron Machine architecture>

In addition the overall system is controlled by a control unit (CU) which is not shown in this figure. The NB is essentially a single digital hardware neuron whose circuits reflect the computation of the various parts of model neuron.
The NB consists of multiple, P, synapse units (SNUs), a dendrite unit (DU), and a soma unit (SU). The outputs of the SNUs are connected to the DU inputs, and the output of the DU is connected to the SU input.
The inputs of the SNUs and the output of the SU become the inputs and output of the NB, and are connected to the outputs and input of the NU, respectively.
All the units in the NB are designed into a large fine-grained pipelined circuit so that the overall NB can process P presynaptic inputs at every clock cycle and computes one neuron output at every bpn = [p/P] clock cycles,
where p is the number of synapses on each neuron. The group of P synapses being computed together is called a synapse bunch (SB).

The NU is composed of P memory modules, each with two individual memories: MM and MX. In each memory module, the output of MM is connected to the address input of MX and the output of MX becomes one of P outputs of the NU.
MXs are dual-port memories in which read and write operations can be carried out simultaneously. The write ports of all MX memories are connected together becoming the input of the NU. The contents of the kth MM and MX are:

MMk(b) = mij

MXk(a) = ya,

where i = mod(b × P + k, bpn × P), j = [b / bpn], mij is the index number of the neuron connected on the ith synapse of the jth neuron, and ya is the output of the ath neuron. By addressing all MM memories with an address b at the sbsel control input,
the outputs of the presynaptic neurons connected to synapses in the bth SB are generated at the output of the NU, simultaneously storing the outputs of postsynaptic neurons through the input of the NU.

The CU repeats the same control sequences for each time step in which all presynaptic inputs for all neurons are supplied from the NU to the NB,
starting from the first SB of the first neuron to the last SB to the last neuron. Simultaneously with the input flow, the outputs of postsynaptic neurons computed in the NB are sent to the NU via the axon bus.

<Hodgkin-Huxley 신경망 모델을 계산하는 뉴런머신 시스템 구성도>

The NM architecture have a number of advantages

1.Communication between the neurons is accomplished just by accessing memories requiring no communication overheads.

2.Network topology information is stored only in MM memories with no restriction on the topology.

3.State and parameter memories are distributed and embedded in the computational circuits and the data paths of the memories are short incurring no memory bottleneck problems.

4.The speed does not depend on the firing rate as most event-driven implementations do.

5.A large number of connections can be computed simultaneously by multiple SNUs.

6.A large-scale pipelining parallelism can be obtained in the NB and the pipeline delay have little impact on the overall performance.

7.Simple and uniform structure without necessarily requiring a main computer.

PUBLICATIONS

We published the NM architecture for multi-layer perceptron (MLP) in 2012 [1], and successfully applied to complex spiking neural networks [2] and back-propagation networks [3] in 2013.
The NM architecture was further extended to support deep belief networks, in which computation procedure is highly complicated involving arbitrary number of restricted Boltzmann machines [4].
We believe the use of the NM architecture is not limited to neural network systems and it can be expanded to general high performance computing applications.

The resulting performances of the NM systems are astonishing. The our deep belief network system is compared with other systems in the following table.

<Speed comparison of Deep Belief Network systems>

Ref.

Type

Internal

Clock(Hz)

Speed(CUPS)

Speedupover PC

Ours

FPGA

Xilinx Kintex 7

200M

1.9G

121x

GPU

NVIDIA GeForce 460 (336)

720M

721M

46x

GPU

NVIDIA GTX 280 (240)

1.3G

672M

43x

CPU

Intel i5-2410M (2)

2.3G

15.7M

1x

CPU

Intel Core2 Quad core

2.83G

10.2M

0.7x

Our small system using a mid-range FPGA chip outperformed most other CPU and GPU systems and only ten times slower than Google Brain system (third one) using 1000 GPUs!

Our neuromorphic system is compared with other Hodgkin-Huxley neural systems. in the following.

<Speed comparison of HH systems>

Ref.

Type

Internal

Clock(Hz)

Speed(CUPS)

Speedupover PC

Ours

FPGA

Xilinx Kintex 7

200M

196M

1,241x

GPU

NVIDIA C2050

1.2G

4.4M

28x

GPU

NVIDIA Telsa S1070

602M

2.5M

16x

GPU

NVIDIA Telsa S1070

602M

2.3M

15x

FPGA

Xilinx Virtex 7

200M

1.9M

12x

CPU

Intel Xeon Quad

2.7G

158K

1x

Our Kintex 7 FPGA system showed more than 20000 times faster than a two-core PC and ten times faster than GPU systems.