Computer Architecture Seminar Abstracts

Spring 2006

Glenn Henry
CenTaur Technology

How to make a highly secure x86 processor

Abstract:

The talk will cover two topics. First is the design & build methodology,
tools, etc. that allow Centaur to design a very small 2-GHZ Pentium-4
compatible processor with only 30 designers. Second is a description of
Centaur's embedded high performance security features (such as AES
encryption). The physical design of these security components will be
used as examples to explore the overall design & build methodology.

Biography:

Glenn Henry is the founder and president of Centaur Technology. Throughout his career, he has played an integral role in the development of the computer industry in the U. S..

Prior to founding Centaur in April 1995, Henry served as a consultant to MIPS Technology (SGI) for one year. From 1988 to 1994 he was Chief Technology Officer and Senior Vice President of the Product Group at Dell Computer Corporation. In that position, he was responsible for all product development activities and, at various times, also responsible for product marketing, manufacturing, procurement, information systems and technical support.

Before his tenure at Dell, Henry served 21 years with IBM. He was the instigator, lead architect and development manager responsible for the IBM System/32, System38 (forerunner of AS/400), and RT/PC (forerunner of Power systems). In 1985, he was appointed an IBM Fellow.

Jean-Yves Bouguet
Intel

PIRO: Benchmarking a Personal Image Retrieval System

Abstract:

It is now common to have accumulated tens of thousands of personal pictures. Efficient access to that many pictures can only be done with a robust image retrieval system. This application is of high interest to processor architects. It is highly compute intensive, and could motivate end users to upgrade their personal computers to the next generations of processors.

A key question is how to assess the robustness of a personal image retrieval system. Personal image databases are very different from digital libraries that have been used by many Content Based Image Retrieval Systems. Personal image databases are very different from digital libraries that have been used by many Content Based Image Retrieval Systems. For example a personal image database has a lot of pictures of people, but a small set of different people typically family, relatives, and friends. Pictures are taken in a limited set of places like home, work, school, and vacation destination. The most frequent queries are searched for people, and for places. These attributes, and many others affect how a personal image retrieval system should be benchmarked, and benchmarks need to be different from existing ones based on art images, or medical images for examples.

The attributes of the data set do not change the list of components needed for the benchmarking of such systems as specified in: data sets, query tasks, ground truth, and evaluation measures.

This talk proposes a way to build these components to be representative of personal image databases, and of the corresponding usage models.

Biography:

Jean-Yves Bouguet is a Senior Researcher at Intel's Microprocessor
Research Labs since 1999. He received his diplome d'ingenieur from
the Ecole Superieure d'Ingenieurs en Electrotechnique et
Electronique (ESIEE, Paris) in 1994 and the M.S. and Ph.D. degrees
in Electrical Engineering from the California Institute of
Technology (Caltech) in 1994 and 1999, respectively.
Computer Vision, Computer Graphics are his main research fields of
interest.

During his thesis work, he has developed and patented a simple and
inexpensive method for scanning objects using shadows.
Subsequently, he developed modeling techniques
that combine 3D geometry capture and scene reflectance acquisition
for realistic rendering of real and synthetic scenes with complex
shape and surface characteristics for which he also holds a patent.
Jean-Yves has received a number of distinctive awards including
the J. Walker von Brimer award for "extraordinary accomplishments
in the field of 3D photography" in 1999.
Recently, his research focus has moved to applying computational vision
techniques to image and video mining applications with a special
emphasis
on search and retrieval in personal image and video collections.

Matthew Arnold
IBM T.J. Watson Research Center

The Future of Virtual Machine Performance

Abstract:

Users of virtual machines care most about two aspects of performance: startup and throughput. In this talk, I will give a brief overview of the techniques commercial VMs use to improve these aspects of performance, and discuss the challenges that still remain. I will then present two new, nontraditional approaches for making progress in these areas.

1) Improving startup performance using a cross-run profile repository (OOPSLA'05). Despite the important role that profiling plays in achieving high performance, current virtual machines discard a program's profile data at the end of execution. Our work presents a fully automated architecture for exploiting cross-run profile data in virtual machines. This work addresses a number of challenges that previously limited the practicality of such an approach.

2) Throughput performance: "Online Performance Auditing" (PLDI'06). This work describes an online framework for evaluating the effectiveness of optimizations, enabling an online system to automatically identify and correct performance anomalies that occur at runtime. This work encourages a shift in the way optimizations are developed and tuned for online systems, and may allow much of the work in offline empirical optimization search to be applied automatically at runtime.

All of this work is implemented and evaluated using IBM's product J9 Java Virtual Machine.

Biography:

Matthew Arnold received his Ph.D. from Rutgers University in 2002, and is now a Research Staff Member at the IBM T.J. Watson Research Center in Hawthorne, NY. For his thesis work he developed low-overhead profiling techniques and showed how they can be used to drive feedback-directed optimization in a virtual machine; this work is currently used in IBM's product JVM. He has worked with the Jikes Research Virtual Machine and IBM's production JVM, and continues to use both for his research. His current interests include virtual machine performance, low overhead profiling, and dynamic analysis of software.

Rodric Rabbah
MIT CSAIL

Toward Introspective and Adaptive System Architectures

Abstract:

The performance gap between processor and memory has widened continuously over
the last decade. As emerging multicore architectures are packing even more
computational power onto a single chip, the memory bottleneck is becoming a
central obstacle to achieving scalability. Such architectures generally magnify
long memory access latencies, and require locality aware and latency hiding
techniques to prevent the memory system from becoming a severe performance
bottleneck.

This talk will describe a simple and effective methodology for mitigating the
memory bottleneck. The strategy leverages speculative and predicated execution,
and is readily applicable to commercial processors available today. In this
work, the compiler uses cache-miss profiling to focus on a relatively small
set of delinquent program references that suffer expensive cache misses. The
compiler then automatically embeds new instructions into the host program to
orchestrate runtime data management. The new instructions execute as part of
the same instruction stream as their host, but effectively run ahead to carry
out various optimizations that improve the overall performance. This talk will
focus on data prefetching as one such optimization. In an implementation for
the Itanium Processor Family, the optimization led to 30% faster execution,
with an average 45% reduction in memory stalls. A significant aspect of this
work is its ability to dynamically adapt to runtime information and dynamic
behavior. For example, the compiler-embedded instructions self-nullify when
they are likely to increase the burden on the memory system. The ability to
dynamically change execution behavior marks a significant step toward
autonomous, introspective, and adaptive applications.

This talk will also describe a transparent, lightweight, and online profiling
scheme that identifies long latency memory references. The technique is part
of a general-purpose dynamic instrumentation and code manipulation framework.
The combination affords the possibility of performing memory-centric
optimizations dynamically, as the application executes. The technique does not
require modifications to the program source code, and works on general-purpose
programs, legacy and third party binaries. The transparency is especially
important since these applications must run efficiently on emerging
architectures for which they were not originally designed.

Biography:

Rodric Rabbah is involved in several projects as a Research Scientist at MIT.
He is a leading contributor to StreamIt, a domain specific language and
compiler for stream programming. He also leads the development of Reptile, an
explicitly parallel compiler for tiled architectures. Currently, he is
developing metrics to systematically categorize applications based on their
runtime characteristics. This work culminates in VersaBench, a new benchmark
suite intended to aid architects in the design of future microprocessors. Since
1999, he has led the development of the Trimaran VLIW processor simulator.
Trimaran is an open-source compilation and simulation infrastructure for EPIC
and VLIW research, and is used at more than thirty universities worldwide.

Advances in VLSI technology have made the raw ingredients for
computation plentiful. Large amounts of fast functional
units, memory, and bandwidth can be made efficient in terms of chip
area, cost, and energy, however, high-performance computers realize
only a small fraction of VLSI's potential. In this talk I will
describe the Merrimac streaming supercomputer, which is being
developed with an integrated view of the applications, software
system, compiler, and architecture. I will show how this approach
leads to an order of magnitude gain in performance per unit cost, unit
power, and unit floor-space for scientific applications compared to
common scientific computers designed around clusters of conventional
CPUs. The talk will cover Merrimac's stream architecture, mapping
scientific applications to effectively run on the stream architecture,
and system issues in the Merrimac supercomputer.

The stream architecture is designed to take advantage of the
properties of modern semiconductor technology --- very high bandwidth
over short distances and very high transistor counts, but limited
global on-chip and off-chip bandwidths --- and match them with the
characteristics of scientific codes --- large amounts of parallelism
and access locality. Organizing the computation into streams and
exploiting the resulting locality using a register hierarchy enables a
stream architecture to reduce the memory bandwidth required by
representative computations by an order of magnitude or more. Hence a
processing node with a fixed memory bandwidth (which is expensive) can
support an order of magnitude more arithmetic units (which are
inexpensive). Because each node has much greater performance (128
double-precision GFLOPs in our current design) than a conventional
microprocessor, a streaming supercomputer can achieve a given level of
performance with fewer nodes, reducing costs, simplifying system
management, and increasing reliability.

Biography:

Mattan Erez received a B.Sc. in Electrical Engineering and a B.A. in
Physics from the Technion, Israel Institute of Technology in 1999. He
subsequently received his M.S in Electrical Engineering from Stanford
University in 2002. His previous work experience includes army service
at a technical research branch of the Israeli Defense Force, and
working as a computer architect in the Israeli Processor Architecture
Research team, Intel Corporation. As a Ph.D. candidate at Stanford
University he participated in the Smart Memories project and is
currently leading the Merrimac Stanford Streaming Supercomputer
project, where his main areas of interest are architecture and its
interaction with the compilation system and the programmer.

Edward Suh
Pufco, Inc.

AEGIS: Architectural EnGine for Information Security

Abstract:

The Internet is expanding into the physical world, connecting billions
of devices. In this expanded network, two contradictory trends are
appearing. On the one hand, the cost of security breaches is increasing
as we place more responsibilities on the devices that surround us. On
the other hand, computing elements are becoming small, disseminated,
unsupervised, and physically exposed. Unfortunately, existing computing
systems do not address physical threats, presenting a significant
vulnerability in future embedded systems.

We have built a tamper-resistant platform using a single-chip secure
processor called AEGIS. Our platform protects applications from physical
attacks as well as software attacks. This enables several applications
such as secure sensor networks, certified execution, and copy protection
of media and software. This talk will describe the architecture of the
AEGIS secure processor and its key primitives, namely, physical random
functions, memory encryption and integrity verification.

Physical Unclonable Functions (or PUFs) are a tamper resistant way of
establishing shared secrets with a physical device. They rely on the
inevitable manufacturing variations between devices to produce an
identity for a device. This identity is arguably unclonable.

Memory encryption and integrity verification protect content stored in
external memory, and are essential to build a secure computing system
that is powerful enough to run applications requiring large memory. The
talk will discuss memory encryption and integrity verification schemes
that are secure, yet efficient and practical.

We have fabricated and tested Physical Unclonable Function chips in TSMC
0.18u technology, and implemented the AEGIS processor on an FPGA.

Biography:

Edward Suh has recently received a Ph.D. degree in Electrical
Engineering and Computer Science from the Massachusetts Institute of
Technology (MIT). Currently, he is leading an effort to develop secure
embedded processors at Pufco Inc. He has worked in the areas of high
performance memory systems, embedded processors, and secure hardware
architecture, and has co-authored over a dozen papers in these areas.
His current research focuses on secure computing systems, in particular,
secure processors and their applications.

Arvind
MIT

Is Hardware Innovation Over?

Abstract:

Does the spread of multicore architectures mean the demise of Application
Specific Integrated Circuits (ASIC)? Power constrained, handheld devices may
be one of the most important economic drivers for the semiconductor industry
in the coming decades. Will the future cell phone functionality be delivered
primarily through multi-core processors? Or will it be through reconfigurable
FPGAs or a system composed of heterogeneous blocks? We will describe how it is
possible to synthesize, quickly and efficiently, large and complex SoC's from
a library of microarchitectural IP blocks, including embedded PowerPC models,
DSPs and a variety of specialized hardware blocks (radios, MPEG4 decoders, ...).
Our project, will provide, among other things, PowerPC "gateware" for others
to use, and will shed light on how IP blocks should be written to be easily
modifiable and reusable.

Biography:

Arvind is the Johnson Professor of Computer Science and Engineering at MIT
where he has been since 1979. In 1992, his group, in collaboration with
Motorola, built the Monsoon dataflow machines and its associated software.
A dozen of these machines were built and installed at Los Alamos National Labs
and other universities, before Monsoon was retired to the Computer Museum in
California.

In 2000, Arvind took a two-year leave of absence to start Sandburst, a fabless
semiconductor company to produce a chip set for 10G-bit Ethernet routers.
In 2003, Arvind co-founded Bluespec Inc., an EDA company to produce a set of
tools for high-level synthesis. He currently serves on the board of both
Sandburst and Bluespec.

In 2001, Dr. R. S. Nikhil and Arvind published the book "Implicit parallel
programming in pH". Arvind's current research interests are synthesis and
verification of large digital systems described using Guarded Atomic Actions;
and Memory Models and Cache Coherence Protocols for parallel architectures
and languages.

Dileep Bhandarkar
Intel

Multi-Core Microprocessor Chips: Motivation & Challenges

Abstract:

Advances in semiconductor process technology allow hundreds of millions of
transistors to be integrated on a single chip. Intel's 90 nm technology
Montecito chip was the first Billion transistor chip featuring dual cores
and large cache in 2005. Nanotechnology that continues to drive Moore's
Law provides a doubling of the transistor density every two years. Multi-core
chips will become common not only in high end servers but also in desktop
and mobile PCs.

Multi-core processors present several challenges related to on-chip system
architecture, power management, reliability, and software scaling. This talk
will touch on some of these challenges and discuss some possible solutions.

Biography:

Dr. Bhandarkar is an IEEE Fellow, and a Distinguished Alumnus of the Indian
Institute of Technology, Bombay, where he received his B. Tech in Electrical
Engineering. in 1970. He also has a M.S. and Ph.D. in Electrical Engineering
from Carnegie Mellon University, and has done graduate work in Business
Administration at the University of Dallas. He is currently Director of the
Enterprise Architecture Lab in processors and chipsets. He has been with Intel
since 1995 and has managed system architecture and performance analysis
activities.

Prior to joining Intel, he spent almost 18 years at Digital
Equipment Corporation, where he managed processor and system architecture,
and performance analysis work related to the VAX, Prism, MIPS, and Alpha
architectures. He also worked at Texas Instruments for 4 years in their
research labs in a variety of areas including magnetic bubble memories,
charge coupled devices, fault tolerant memories, and computer architecture.
Dr. Bhandarkar holds 15 U.S. Patents and has published more than 30 technical
papers in various journals and conference proceedings. He is also the author
of a book titled Alpha Architecture and Implementations.