Computational RAM

Computational RAM (C-RAM) is semiconductor random access
memory with processors incorporated into the design to
build an inexpensive massively-parallel computer.

In a typical computer with 32MB of 16Mb DRAM chips and a 100MHz processor,
there is 3000 times the bandwidth available inside the memory vs. at
the CPU. If you can't bring the memory bandwidth to the processor, then
bring the processors to the memory.

Computational RAM (C-RAM) is conventional RAM with SIMD
processors added to the sense amplifiers . These
bit-serial, externally programmed processors add only a
small amount of area to the chip and in a 32Mbyte memory
have an aggregate performance of 13 billion 32 bit
operations per second. The chips are extendible and
completely software programmable. In this paper we
describe (1) the C-RAM architecture, (2) a working 8Kbit
prototype, (3) a full scale C-RAM designed in a 4Mbit DRAM
process, and (4) C-RAM applications.

These files are also available via anonymous FTP from
ftp.eecg.toronto.edu in /pub/tech_reports/dunc/*

Boiler Plate:

The above paper is
Copyright 1992 IEEE. Personal use of this material is
permitted. However, permission to reprint/republish this
material for advertising or promotional purposes or for
creating new collective works for resale or redistribution
to servers or lists, or to reuse any copyrighted component
of this work in other works must be obtained from the
IEEE.

Technology considerations dictate that a petaOPS
computer implemented with currently available
technology would do most of its computing with
simple processors integrated into memory and thereby
exploit the high internal memory bandwidth. Such a
system is proposed.

The performance of a workstation can be sped-up thousands
of times by running highly parallel applications
directly in the memory. In these preliminary
results, the simulated performance of these applications
and kernels run on 32MB of 150ns C-RAM are compared to the
measured performance on a SUN SparcStation-5 70MHz
workstation.

Computational RAM (C-RAM) is semiconductor random access
memory with processors incorporated into the design to
build an inexpensive massively-parallel computer. If an
application contains sufficient parallelism, it will
typically run orders of magnitude faster in C-RAM than the
central processing unit. This work includes architecture,
prototype chips, compiler and applications.

C-RAM integrates SIMD (Single Instruction stream, Multiple
Data stream) processors into random access memory at the
sense amplifiers (along one edge of a 2 dimensional array
of memory cells). The novel combination of processors with
memory (the memory retains its memory interface) allows
C-RAM to be used as computer main memory, as a video frame
buffer or for stand-alone signal processing. The use of
high-density commodity dynamic memory makes C-RAM
economical. The bit-serial, externally programmed
processing elements (PEs) add only slightly to the cost of
the chip (9-20%), yet a workstation with 32Mbytes of C-RAM
would have an aggregate performance of 13 billion 32 bit
operations per second. A working 64 processing element per
chip C-RAM has been fabricated and the PE for a 2048PE,
4Mbit chip has been designed.

The performance of C-RAM for kernels and real applications
was obtained by simulating their execution. For this
purpose, a prototype compiler was written. Applications
are drawn from the fields of signal and image processing,
computer graphics, synthetic neural networks, CAD, data
base and scientific computing.