Active Research
Projects

The goal of the ADEPT lab is to dramatically improve computing
capability by reducing the cost and risk of designing custom silicon
for new application areas. Our integrated 5-year research mission
cuts across applications, programming systems, architecture, and
hardware design and verification methodologies.

RISC-V is a new free and open instruction set
architecture (ISA) developed at UC Berkeley,
initially designed for research and education, but
is now increasingly being used for commercial
designs. A full set of open-source software tools
are available as well as several open-source
processor implementations. RISC-V was
initially developed as part
of Par
Lab, continued in ASPIRE and is now part of
ADEPT.

The FireBox project is developing
a system architecture for third-generation
Warehouse-Scale Computers (WSCs). Firebox scales
up to a ~1 MegaWatt WSC containing up to 10,000
compute nodes and up to an Exabyte (2^60 Bytes) of
non-volatile memory connected via a low-latency,
high-bandwidth optical switch. Each compute node
contains a System-on-a-Chip (SoC) with around 100
cores connected to high-bandwidth on-package DRAM.
Fast SoC network interfaces reduce the software
overhead of communicating between application
services and high-radix network backplane switches
connected by Terabit/sec optical fibers reduce the
network's contribution to tail latency. The very
large non-volatile store directly supports
in-memory databases, and pervasive encryption
ensures that data is always protected in transit
and in storage. FireBox is being developed in the
Berkeley ADEPT
and RISE labs.

Chisel is a new open-source hardware construction
language developed at UC Berkeley that supports
advanced hardware design using highly
parameterized generators and layered
domain-specific hardware languages. Chisel is
embedded in the Scala programming language, which
raises the level of hardware design abstraction by
providing concepts including object orientation,
functional programming, parameterized types, and
type inference. Chisel was originally developed
in the DoE Project Isis
and Par
Lab, and development continues in
ASPIRE.

Most manycore hardware designs have the potential
to achieve maximum energy efficiency when operated
in a broad range of supply voltages, spanning from
nominal down to near the transistor threshold. As
part
of ASPIRE,
we are working on new circuit and architectural
techniques to enable parallel processors to work
across a broad supply range while tolerating
technology variability, and providing immunity to
soft- and hard‐errors. We have built several
prototype resilient microprocessors codenamed
Raven in 28nm FDSOI technology.

In a collaboration with MIT, the University of
Colorado at Boulder, and Micron Technology, we are
exploring the use of silicon photonics to provide
high bandwidth energy-efficient links between
processors and memory. A recent
Nature
publication describes how we used this technology
to build a RISC-V microprocessor that communicates
directly with light. Integrated photonics is a key
component of the FireBox project where it will be
used to construct warehouse-scale computers.

Earlier
Projects at UC Berkeley

ASPIRE was a 5-year research project that
recognized the shift from
transistor-scaling-driven performance improvements
to a new post-scaling world where whole-stack
co-design is the key to improved
efficiency. Building on the success of
the Par
Lab project, it explored deep hardware and
software co-tuning to achieve the highest possible
performance and energy efficiency for future
warehouse-scale and mobile computing systems.

The Hwacha project developed a new
vector-fetch architecture to improve
energy-efficiency of data-parallel
accelerators, following on from earlier work on
Scale and Maven. Versions of the Hwacha vector
accelerator have been taped out several times as
RISC-V Rocket coprocessors in both 28nm and 45nm nodes, and
running up to 1.5GHz+. Hwacha was a project in
the ASPIRE
lab.

Applications built by composing different parallel
libraries perform poorly when those libraries
interfere with one another by obliviously using the
same physical cores, leading to destructive resource
oversubscription. Lithe is a low-level substrate that
provides basic primitives and a standard interface
for composing parallel libraries efficiently, and can
be inserted underneath the runtimes of legacy
parallel libraries, such as TBB and OpenMP, to
provide bolt-on composability without changes to
existing application code. Lithe was initially
developed
in Par
Lab and is now part
of DEGAS,
and
is available as an open-source project.

DIABLO is a wind tunnel for datacenter research,
simulating O(10,000) datacenter servers and
O(1,000) switches for O(100) seconds. DIABLO is
built with FPGAs and executes real instructions
and moves real bytes, while running the full Linux
operating system and unmodified datacenter
software stacks on each simulated server. DIABLO
has successfully reproduced some real-life
datacenter phenomena, such as the memcached
request latency long tail at large scales. DIABLO
was initially developed in
the RAMP
project, and continued
in ASPIRE.
The next-generation of Berkeley datacenter
simulators is being developed as part of the
FireBox project.

The DHOSA research project focuses on building
systems that will remain secure even when the
operating system is compromised or hostile. DHOSA
is a collaborative effort among researchers from
Harvard, Stony Brook, U.C. Berkeley, University of
Illinois at Urbana-Champaign, and the University
of Virginia.

With the end of sequential processor performance
scaling, multicore processors provide the only
path to increased performance and energy
efficiency in all platforms from mobile to
warehouse-scale computers. The Par Lab was
created by a team of Berkeley researchers with
the ambitious goal of enabling "most
programmers to be productive writing efficient,
correct, portable SW for 100+ cores & scale as
cores increase every 2 years".

Based on our experiences designing, implementing, and evaluating the
Scale vector-thread
architecture, we identified three primary directions for
improvement to simplify both the hardware and software aspects of the
VT architectural design pattern: (1) a unified VT instruction set
architecture; (2) a VT microarchitecture more closely based on the
vector-SIMD pattern; and (3) an explicitly data-parallel VT
programming methodology. These ideas formed the foundation for the
Maven VT architecture.

Tessellation is a manycore OS targeted at the
resource management challenges of emerging client
devices. Tessellation is built on two central
ideas: Space-Time Partitioning and Two-Level
Scheduling. Tessellation was initially developed
within Par
Lab and is now part of
the Swarm
Lab.

The RAMP project was a multi-University
collaboration to develop new techniques for
efficient FPGA-based emulation of novel parallel
architectures thereby overcoming the multicore
simulation bottlenecks facing computer
architecture researchers. At Berkeley, prototypes
included the 1,008 processor
RAMP Blue system and
the RAMP
Gold manycore emulator, as well as the
follow-on DIABLO datacenter emulator.

RAMP Blue was the first large-scale RAMP system
built as a demonstrator of the ideas. The system
models a cluster of up to 1008 MicroBlaze cores
implementing using up to 84 Virtex-II Pro FPGAs on
up to 21 BEE2 boards. The software infrastructure
consists of GCC, uClinux, and the UPC parallel
language and runtimes, and the prototype can run
off-the-shelf scientific applications.

The Scale microprocessor introduced a new
architectural paradigm, vector-threading, which
combines the benefits of vector and threaded
execution. The vector-thread unit can smoothly
morph its control structure from vector-style to
threaded-style execution.

In many dynamic thread-parallel applications, lock
management is the source of much programming
complexity as well as space and time overhead. We
are investigating possible practical
microarchitectures for implementing transactional
memory, which provides a superior solution for
atomicity that is much simpler to program than
locks, and which also reduces space and time
overheads.

We have been developing techniques that combine new circuit designs
and microarchitectural algorithms to reduce both switching and leakage
power in components that dominate energy consumption, including
flip-flops, caches, datapaths, and register files.

Modern ISAs such as RISC or VLIW only expose to software properties of
the implementation that affect performance. In this project we are
developing new energy-exposed hardware-software interfaces that also
allow software to have fine-grain control over energy consumption.

Existing variable-length instruction formats provide higher code
densities than fixed-length formats, but are ill-suited to pipelined
or parallel instruction fetch and decode. Heads-and-Tails is a new
variable-length instruction format that supports parallel fetch and
decode of multiple instructions per cycle, allowing both high code
density and rapid execution for high-performance embedded processors.

Early Projects

The Berkeley IRAM project sought
to understand the entire spectrum of issues
involved in designing general-purpose computer
systems that integrate a processor and DRAM onto a
single chip - from circuits, VLSI design and
architectures to compilers and operating
systems.

PHiPAC was the first autotuning project,
automatically generating a high-performance general
matrix-multiply (GEMM) routine by using
parameterized code generators and empirical search
to produce fast code for any platform. Autotuners
are now standard in high-performance library
development.

T0 (Torrent-0) was the first single-chip vector
microprocessor. T0 was designed for multimedia,
human-interface, neural network, and other digital
signal processing tasks. T0 includes a MIPS-II
compatible 32-bit integer RISC core, a 1KB
instruction cache, a high performance fixed-point
vector coprocessor, a 128-bit wide external memory
interface, and a byte-serial host interface. T0
formed the basis of
the SPERT-II
workstation accelerator.