For the last five years, Google has been designing Tensor Processing Units (TPUs); we have announced four of them. TPUs are domain-specific architectures that perform both neural network training and inference. Based on our experience with cloud and neural-network workloads, this presentation will talk about how training differs from inference, how Google does codesign, how we use TPUs in production applications, and our concerns for the future of chip-building.

There will be Q&A discussion featuring the above speaker.

9:50am-10:15am

Session 7: AI Accelerators for Edge

Neural networks are a critical component of most deep-learning systems and are used for tasks ranging from natural language processing to object recognition. Whereas data centers handle most network training, inference is moving out of data centers and into edge devices, reducing latency and eliminating the requirement for cloud services. Edge systems incorporating AI include automobiles, security cameras, IoT devices, and smartphones. This session, led by The Linley Group principal analyst Bob Wheeler, will examine the IP cores available to simplify the design of edge SoCs that incorporate deep learning.

The trend of deep learning migrating from the data center to the edge of the network is driving the need for high-performance, low-power machine-learning solutions. These systems are being deployed in applications that require enhanced AI capabilities; they include ADAS systems in vehicles, home and industrial robotics, and all manner of consumer electronics. This presentation will discuss some of the AI-acceleration challenges facing these types of systems and also present the company's innovative data-flow-based solutions.

Artificial Intelligence (AI) applications are moving toward the edge rather than relying on cloud services. AI's tremendous computational requirements, along with power constraints of edge devices, call for a specialized processor architecture. This presentation will discuss a dedicated low-power processor family for deep learning at the edge. It will cover a self-contained and specialized AI architecture scalable for a broad range of end markets and how it combines with CEVA's Deep Neural Network (CDNN) software framework.

Power dissipation and performance trade-offs for machine learning at the edges of the computational cloud presents the opportunity for new computing architectures. Current acceleration approaches such as the GPUs' SIMD and systolic array architectures have critical challenges. This presentation examines the requirements of machine learning processing via the dual lenses of power efficiency vs. static immutability to highlight key architectural changes necessitated for high performance/power efficient machine learning at the edge.

There will be Q&A and a panel discussion featuring above speakers.

11:45am-1:00pm

LUNCH - Sponsored by Rambus

1:00pm-2:40pm

Session 8: Automotive Subsystem Design

Fully autonomous vehicles will require multiple processors to convert sensor data into accurate 3D environmental models, calculate a proper course of action, and safely control the vehicle under all conditions that a human driver can handle today. This session, moderated by Linley Group senior analyst Mike Demler, will discuss state-of-the-art automotive AI processors, CPUs that meet stringent ISO 26262 ASIL D requirements, and DSP cores that enable object detection and classification using lidar and radar sensors.

The natural tendency when developing processors for an emerging, fast-moving market like autonomous vehicles is to increase flexibility by maximizing programmable processing elements and centralizing resources. However, production-ready automotive hardware must operate in an electrically noisy environment with constrained power and thermal envelopes over extended temperature ranges for 15-20 years. This presentation will explore the balance between programmable and algorithm-specific hardware for AI-centric workloads as well as concepts for scalable and upgradeable hardware in automotive applications.

This presentation will cover object detection and classification using the 77GHz automotive radar and lidar for advanced driver-assistance systems (ADAS), including a review of the relevant signal processing, typical requirements and design parameters. We will then provide an overview of the ARC HS47D processor, highlighting the instruction-set and microarchitecture features of the core for DSP and control operations, and show an example of an efficient radar signal-processing subsystem that features the HS47D processor.

There will be Q&A and a panel discussion featuring above speakers.

2:40pm-3:00pm

BREAK - Sponsored by Arm

3:00pm-4:00pm

Session 9: Next-Generation Memory Subsystems

As processor performance continues to rise, memory bandwidth is increasingly a bottleneck for application throughput. Memory bandwidth is of particular concern for AI workloads, which have enormous compute requirements for which caches offer little relief. This session, moderated by The Linley Group principal analyst Linley Gwennap, discusses trends in high-performance memory subsystems, including power reduction, new types of memory, compute-in-memory architecture, and memory compression.

AI and other leading-edge applications achieve higher performance and power efficiency by combining specialized silicon with well-designed memory systems. These systems must address challenging power and physical-design constraints. This presentation will discuss the crucial role that memory systems play and highlight some of the performance and power-efficiency tradeoffs that architects face when designing memory systems for these applications.

The requirement for real-time information intelligence is quickly pushing power consumption, memory bandwidth, compute density, and storage capacity to their theoretical limits. Advances in semiconductor processes, computing parallelism, and software frameworks help, but computing architectures and memory/storage paradigms must change for AI innovation to proceed unabated. This presentation will discuss intermediate and long-term changes in traditional memory and storage interfaces, advances in new memory media, and proposals for new compute-in-memory architectures to optimize data flow within the enterprise server.