Exploring the Scalability and Performance of Networks-on-Chip with Deflection Routing in 3D Many-core Architecture

Weldezion, Awet Yemane

KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.

2016 (English)Doctoral thesis, comprehensive summary (Other academic)

Abstract [en]

Three-Dimensional (3D) integration of circuits based on die and wafer stacking using through-silicon-via is a critical technology in enabling "more-than-Moore", i.e. functional integration of devices beyond pure scaling ("more Moore"). In particular, the scaling from multi-core to many-core architecture is an excellent candidate for such integration. 3D systems design follows is a challenging and a complex design process involving integration of heterogeneous technologies. It is also expensive to prototype because the 3D industrial ecosystem is not yet complete and ready for low-cost mass production. Networks-on-Chip (NoCs) efficiently facilitates the communication of massively integrated cores on 3D many-core architecture. In this thesis scalability and performance issues of NoCs are explored in terms of architecture, organization and functionality of many-core systems.

First, we evaluate on-chip network performance in massively integrated many-core architecture when network size grows. We propose link and channel models to analyze the network traffic and hence the performance. We develop a NoC simulation framework to evaluate the performance of a deflection routing network as the architecture scales up to 1000 cores. We propose and perform comparative analysis of 3D processor-memory model configurations in scalable many-core architectures.

Second, we investigate how the deflection routing NoCs can be designed to maximize the benefit of the fast TSVs through clock pumping techniques. We propose multi-rate models for inter-layer communication. We quantify the performance benefit through cycle-accurate simulations for various configurations of 3D architectures.

Finally, the complexity of massively integrated many-core architecture by itself brings a multitude of design challenges such as high-cost of prototyping, increasing complexity of the technology, irregularity of the communication network, and lack of reliable simulation models. We formulate a zero-load average distance model that accurately predicts the performance of deflection routing networks in the absence of data flow by capturing the average distance of a packet with spatial and temporal probability distributions of traffic.

The thesis research goals are to explore the design space of vertical integration for many-core applications, and to provide solutions to 3D technology challenges through architectural innovations. We believe the research findings presented in the thesis work contribute in addressing few of the many challenges to the field of combined research in many-core architectural design and 3D integration technology.

Pamunuwa, Dinesh

Abstract [en]

We study a static model for 2-D and 3-D networks that accurately represents the average distance travelled by packets under deflection routing, which is a specific form of adaptive routing. The model captures static properties of the network topology and the spatial distribution of traffic, but does not take into account traffic loading and congestion. Even though this static model cannot accurately predict packet latency under high load, we contend that it is a perfect predictor of deflection routing networks’ relative performance under any load condition below saturation, and thus always correctly predicts the optimum network configuration. This is verified through cycle-accurate simulations of congested and uncongested networks with fully adaptive, deflection routing for regular traffic patterns such as uniform random, localised, bursty, and others, as well as irregular patterns in both regular and irregular networks. As the networks with minimal average distance perform best even under high traffic load, the average distance model establishes a robust relation between a static network property, average distance, and network performance under load, providing new insight into network behaviour and an opportunity to identify the optimal network configuration without time-consuming simulations.

Abstract [en]

This paper describes a powerful simulation platform that enables accurate simulations of numerous network configurations under realistic traffic patterns to predict the performance and power needs of a 3-D integrated system early in the design flow. The simulation platform can model virtually any sized 2-D or 3-D network configuration, providing low-cost and fast tradeoff evaluations of various systems architectures. The network simulator uses scalable RTL-level models that can be used for accurate power and timing analyses. We demonstrate the capability of our simulation model by analyzing the performance of various network topologies under spatio-temporal traffic patterns to show how the network topology can be adjusted to meet the performance requirements of a design before it is manufactured. The simulation results can be used to optimize the placement of cores and communication buses early in the flow. By using the model, standard applications such as mobile application processor, femto-cell base-stations on-chip and wide-IO TSV memory stacking can be simulated.

Abstract [en]

In this paper, we explore the cost of clock pumping techniques implemented for scalable 3-D Integrated Systems in the complexity of interconnect, circuit, and architecture level changes. Their effect in terms of area and power for comparable performance is estimated. Our results show that by using 50% of the number of TSVs, we achieve the same performance as standard implementation with insignificant area and power overhead from the overall system cost. The proposed pumping technique can be used as one of the components in 3-D systems design for several applications that require logic-on-logic or memory-on-logic stacking.

Abstract [en]

A general expression for the average distance for meshes of any dimension and radix, including unequal radices in different dimensions, valid for any traffic pattern under zero-load condition is formulated rigorously to allow its calculation without network-level simulations. The average distance expression is solved analytically for uniform random traffic and for a set of local random traffic patterns. Hot spot traffic patterns are also considered and the formula is empirically validated by cycle true simulations for uniform random, local, and hot spot traffic. Moreover, a methodology to attain closed-form solutions for other traffic patterns is detailed. Furthermore, the model is applied to guide design decisions. Specifically, we show that the model can predict the optimal 3-D topology for uniform and local traffic patterns. It can also predict the optimal placement of hot spots in the network. The fidelity of the approach in suggesting the correct design choices even for loaded and congested networks is surprising. For those cases we studied empirically it is 100%.

Abstract [en]

Several forms of processor memory organizations have been in use to optimally access off-chip memory systems mainly the Hard disk drives (HDD). Recent trends show that the solid state drives - (SSD) such as flash memories replacing HDDs and multi-processor memory system realized in a single 3-D structure with network-on-chip (NOC) architecture as a communication medium. This paper discusses high level memory organization and architectural modeling and simulation based on 3-D NOC. A comparative analysis among several models including Dance-hall, Sandwich, Terminal, Per-layer and mixed architectures is done. Simulations in cycle accurate 3-D NOC VHDL model are done to evaluate the performance each architecture in uniform and local traffic patterns.

Abstract [en]

Design Constraints imposed by global interconnect delays as well as limitations in integration of disparate technologies make 3-D chip stacks an enticing technology solution for massively integrated electronic systems. The scarcity of vertical interconnects however imposes special constraints on the design of the communication architecture. This article examines the performance and scalability of different communication topologiesfor 3-D Network-on-Chips (NoC) using Through-Silicon-Was (TSV) for inter-die connectivity. Cycle accurate RTL-level simulations are conducted for two communication schemes based on a 7-port switch and a centrally arbitrated vertical bus using different traffic patterns. The scalability of the 3-D NoC is examined under both communication architectures and compared to 2-D NoC structures in terms of throughput and latency in order to quantify the variation of network performance with the number of nodes and derive key design guidelines.

Abstract [en]

Through silicon vias (TSVs) are the backbone of 3D integration technology connecting vertically stacked ICs. Parallel TSVs in the form of bundles are used for vertical signaling.In this paper, we present the ways of maximizing the total bandwidth of a TSV bundle placed in a fixed area by varying the density and the geometries. The ways of optimizing the total bandwidth using analytical methods fora bundle of TSVs placed in a structure with a fixed area and length are examined. The result shows that for uniformly distributed TSVs, maximum bandwidth by proportionalplacement of fewer number of TSV in the bundle can be achieved.

Weldezion, Awet Yemane

KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.

Ebrahimi, Masoumeh

KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics. University of Turku, Finland.

Daneshtalab, Masoud

KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.

Tehnhunen, Hannu

KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics. University of Turku, Finland.

2015 (English)Conference paper, Published paper (Refereed)

Abstract [en]

Beside different core sizes in many-core Systems-on-Chip, the costand reliability issues of TSVs move 3D NoCs toward heterogonousdesigns. Such heterogeneity introduces design complexity and newchallenges for obtaining a high performance, low power, low area,and a reliable design. By taking all these factors into account, wepropose a design as a combination of Q-Learning and deflectionrouting in a heterogeneous 3D NoCs. This design enables therouting algorithm to dynamically adjust itself to the underlyingtraffic condition and topology arrangement at run time. Thereby,the network can reach its optimal performance and minimum powerconsumption shortly after a reconfiguration either because of anoccurred fault in the network or a traffic change.