TUOBA EHT SROHTUA Luca Benini is Professor at the Department of Electrical Engineering and Computer Science (DEIS) of the University of Bologna. He also holds a visiting faculty position at the Ecole Polytechnique Federale de Lausanne. He held position at Stanford University and the Hewlett-Packard Labora- tories. He received a Ph.D. degree in electrical engineering from Stanford University in 1997. .rD Benini's research interests are in the design and computer-aided design of integrated circuits, architectures and system, with special emphasis on low-power applications. He has published more than 270 papers in peer-reviewed international journals and conferences, three books and several book chapters. He has been Program Chair and Vice-chair of Design Automation and Test in Europe Conference. He has been a member of the technical pro- gram committee and organizing committee of a number of technical conferences, including the Design Automation Conference, International Symposium on Low Power Design, the Symposium on Hardware-Software Codesign . He is Associate Editor of the IEEE Transactions on Computer Aided Design of Circuits and Systems and the ACM Journal on Emerging Tech- nologies in Computing Systems. He is a Senior Member of the IEEE. Giovanni De Micheli (Ph.D.U. Berkely 1983) is Professor and Direc- tor of the Integrated Systems Centre at EPF Lausanne, Switzerland, and President of the Scientific Committee of CSEM, Neuchatel, Switzer- land. Previously, he was Professor of Electrical Engineering at Stanford University. His research interests include several aspects of design technologies for integrated circuits and systems, with particular emphasis on systems and networks on chips, design technologies and low-power design. He is author of Synthesis and Optimization of Digital Circuits, McGraw-Hill, 1994, co-author and/or co-editor of six other books and of over 300 techni- cal articles. He is, or has been, member of the technical advisory board of several companies, including Magma Design Automation, Coware, IROC Technologies, Ambit Design Systems and STMicroelectronics. x About the Authors Dr. De Micheli is the recipient of the 2003 IEEE Emanuel Piore Award for contributions to computer-aided synthesis of digital systems. He is a Fellow of ACM and IEEE. He received the Golden Jubilee Medal for out- standing contributions to the IEEE CAS Society in 2000. He received the 1987 D. Pederson Award for the best paper on the IEEE Transactions on CAD/ICAS and two Best Paper Awards at the Design Automation Confer- ence, in 1983 and in 1993, and a Best Paper Award at the Date Conference in 2005. He was President of the IEEE CAS Society (2003) and President Elect of the IEEE Council on EDA (2005-2006). He was Editor in Chief of the IEEE Transactions on CAD/ICAS in 1987-2001, Program Chair (1996-1997) and General Chair (2000) of the Design Automation Conference (DAC), Pro- gram (1988) and General Chair (1989) of the International Conference on Computer Design (ICCD), Program Chair of pHealth (2006) and VLSI System on Chip Conference (2006). LIST FO SROTUBIRTNOC Davide Bertozzi Engineering Department, University of ,ararreF ,ararreF Italy Israel Cidon Technion- Israel Institute of ygolonhceT Technion ,ytiC Haifa, Israel Kees Goossens Philips Research Laboratories Eindhoven, ehT Netherlands Hoi-Jun Yoo, Kangmin Lee, Se-Joong Lee and Kwanho Kim KAIST- Korea Advanced Institute of Science and ygolonhceT ,noejeaD Republic of aeroK Srinivasan Murali Stanford University Palo Alto, California, ASU CHAPTER 1 NETWORKS ON CHIP The reason for the growing interest in networks on chips (NoCs) can be explained by looking at the evolution of integrated circuit technology and at the ever-increasing requirements on electronic systems. The integrated microprocessor has been a landmark in the evolution of computing tech- nology. Whereas it took monstrous efforts to be completed, it appears now as a simple object to us. Indeed, the microprocessor involved the con- nection of a computational engine to a layered memory system, and this was achieved using busses. In the last decade, the frontiers of integrated circuit design opened widely. On one side, complex application-specific integrated circuits (ASICs) were designed to address-specific applications, for example mobile telephony. These systems required multiprocess- ing over heterogeneous functional units, thus requiring efficient on-chip communication. On another side, multiprocessing platforms were devel- oped to address high-performance computation, such as image render- ing. Examples are Sony's emotion engine 25 and IBM's cell chip 26, where on-chip communication efficiency is key to the overall system performance. At the same time, the shrinking of processing technology in the deep sub- micron (DSM) domain exacerbated the unbalance between gate delays and wire delays on chip. Accurate physical design became the bottleneck for design closure, a word in jargon to indicate the ability to conclude success- fully a tape out. Thus, the on-chip interconnection is now the dominant factor in determining performance. Architecting the interconnect level at a higher abstraction level is a key factor for system design. We have to understand the introduction of NoCs in systems-on-chip (SoCs) design as a gradual process, namely as an evolution of bus inter- connect technology. For example, there is not a strict distinction between multi-layer busses and crossbar NoCs. We have also to credit .C Seitz and .W Dally for stressing the need of network interconnect for high- performance multiprocessing, and for realizing the first prototypes of networked integrated multiprocessors 9. But overall, NoC has become a broad topic of research and development in the new millennium, when designers were confronted with technological limitations, rising hardware design costs and increasingly higher system complexity. Chapter 1 (cid:12)9 Networks on Chip (cid:12)9 FIGURE 1.1 Traffic pattern in a large-scale system. Limited parallelism is often a cause of congestion. 1.1 YHW PIHC-NO ?GNIKROWTEN Systems on siliconh ave a complexity comparable to skyscrapers or aircraft carriers, when measured in terms of number of basic elements. Differently from other complex systems, they can be cloned in a straightforward way but they have to be designed in correctly, as repairs are nearly impossible. SoCs require design methodologies that have commonalities with other types of large-scale system design (Fig. 1.1). In particular, when look- ing at on-chip interconnect design methods, it is useful to compare the on-chip interconnect to the worldwide interconnect provided by the Inter- net. The latter is capable of taming the system complexity and of providing reliable service in presence of local malfunctions. Thus, networking tech- nology has been able to provide us with quality of service (QoS), despite the heterogeneity and variability of the Intemet nodes and links. It is then obvious that networking technology can be instrumental for the bettering of very-large-scale integration (VLSI) circuit/system design technology. On the other hand, the challenges in marrying network and VLSI technologies are in leveraging the essential features of networking that are crucial to obtaining fast and reliable on-chip communication. Some novices think that on-chip networking equates to porting the Transmission 2.1 Technology Trends (cid:12)9 FIGURE 1.2 Distributed systems communicate via a limited number of cables. Conversely, VLSI chips esu up to 10 levels of wires for communicating. Control Protocol/Internet Protocol (TCP/IP) to silicon or achieving an on-chip Internet. This is not feasible, due to the high latency related to the complexity of TCP/IP. On-chip communication must be fast, and thus networking techniques must be simple and effective. Bandwidth, latency and energy consumption for communication must be traded off in the search for the best solution. On the bright side, VLSI chips have wide availability of wires on many layers, which can be used to carry data and control information. Wide data busses realize the parallel transport of information. Moreover, data and control do not need to be transported by the same means, as in net- worked computers (Fig. 1.2). Local proximity of computational and stor- age unit on chip makes transport extremely fast. Overall, the wire-oriented nature of VLSI chips makes on-chip networking both an opportunity and a challenge. In summary, the main motivation for using on-chip networking is to achieve performance using a system perspective of communication. This reason is corroborated by the fact that simple on-chip communication solutions do not scale up when the number of processing and storage arrays on chip increases. For example, on-chip busses can serve a lim- ited number of units, and beyond that, performance degrades due to the bus parasitic capacitance and the complexity of arbitration. Indeed, it is the trend to larger-scale on-chip multiprocessing that demands on-chip networking solutions. 2.1 YGOLONHCET SDNERT In the current projections [37] of future silicon technologies, the oper- ating frequency and transistor density will continue to grow, making energy dissipation and heat extraction a major concern. At the same time, Chapter I (cid:12)9 Networks on Chip on-chip supply voltages will continue to decrease, with adverse impact on signal integrity. The voltage reduction, even though beneficial, will not suffice to mitigate the energy consumption problem, where a major contribution is due to leakage. Thus, SoCs will incorporate dynamic power management (DPM) techniques in various forms to satisfy energy consumption bounds 4. Global wires, connecting different functional units, are likely to have propagation delays largely exceeding the clock period 18. Whereas sig- nal pipelining on interconnections will become common practice, correct design will require knowing the signal delay with reasonable accuracy. Indeed, a negative side effect of technology downsizing will be the spread- ing of physical parameters (e.g., variance of wire delay per unit length) and its relative importance as compared to the timing reference signals (e.g., clock period). The spreading of physical parameters will make it harder to achieve high-performing chips that safely meet all timing constraints. Worst-case timing methodologies, that require clocking period larger than the worst- case propagation delay, may underuse the potentials of the technology, especially when the worst-case propagation delays are rare events. More- over, it is likely that varying on-chip temperature profiles (due to varying loads and DPM) will increase the spread of wiring delays 2. Thus, it will be mandatory to go beyond worst-case design methodology, and use fault-tolerant schemes that can recover from timing errors ,11 30, 35. Most large SoCs are designed using different voltage islands 23, which are regions with specific voltage and operation frequencies, which in turn may depend on the workload and dynamic voltage and frequency scal- ing. Synchronization among these islands may become extremely hard to achieve, due to timing skews and spreads. Global wires will span multiple clock domains, and synchronization failures in communicating between different clock domains will be rare but unavoidable events 12. 1.2.1 Signal Integrity With forthcoming technologies, it will be harder to guarantee error-free information transfer (at the electrical level) on wires because of several reasons: Reduced signal swings with a corresponding reduction of voltage noise margins. Crosstalk is bound to increase, and the complexity of avoiding crosstalk by identifying all potential on-chip noise sources will make it unlikely to succeed fully. Electromagnetic ecnerefretni (EMI) by external sources will become more of a threat because of the smaller voltage swings and smaller dynamic storage capacitances. 2.1 Technology Trends The probability of occasional synchronization failures and/or metastability will rise. These erroneous conditions are possible dur- ing system operation because of transmission speed changes, local clock frequency changes, timing noise (jitter), etc. Soft errors due to collision of thermal neutrons (produced by the decay of cosmic ray showers) and/or alpha particles (emitted by impurities in the package). Soft errors can create spurious pulses, which can affect signals on chip and/or discharge dynamic storage capacitances. Moreover, SoCs may be willfully operated in error-prone operating condi- tions because of the need of extending battery lifetime by lowering energy consumption via supply voltage over-reduction. Thus, specific run-time policies may trade-off signal integrity for energy consumption reduction, thus exacerbating the problems due to the fabrication technology. 1.2.2 Reliability System-level reliability is the probability that the system will operate cor- rectly at time, t, as a function of time. The expected value of the reliability function is the mean time to failure (MTTF). Increasing MTTF well beyond the expected useful life of a product is an important design criterion. Highly reliable systems have been object of study for many years. Beyond traditional applications, such as aircraft control, defense applications and reliable computing, there are many new fields requiring high-reliable SoCs, ranging from medical applications to automotive control and more generally to embedded systems that are critical for human operation and life. The increased demand of high-reliable SoCs is counterbalanced by the increased failure rates of devices and interconnects. Due to technology downscaling, failures in the interconnect due to electromigration are more likely to happen (Fig. 1.3). Similarly, device failure due to dielectric break- down is more likely because of higher electric fields and carrier speed (Fig. 1.4). Temperature cycles on chip induce mechanical stress, that has counter-productive effects 28. For these reasons, SoCs need to be designed with specific resilience toward hard (i.e., permanent) and soft (i.e., transient) malfunctions. System-level solutions for hard errors involve redundancy, and thus require the on-line connection of a stand-by unit and disconnection of the faulty unit. Solutions for soft errors include design techniques for error containment, error detection and correction via encoding. Moreover, when soft errors induce timing errors, system based on double-latch clocking can be used for detection and correction. NoCs can provide resilient solutions toward hard errors (by supporting seamless connection/ disconnection of units) and soft errors (by layered error correction). 6 Chapter 1 (cid:12)9 Networks on Chip (cid:12)9 FIGURE 1.3 Failure no a wire due to electromigration. (cid:12)9 FIGURE 1.4 Failure no a transistor due to oxide breakdown. 1.2.3 Non-determinism ni CoS Modeling dna ngiseD sA SoC complexity scales, it will be more difficult, if not impossible, to capture their functionality with fully deterministic models of operation. In other words, system models may have multiple implementations. Property abstraction, which is key to managing complexity in modeling and design, will hide implementation details and designers will have to relinquish control of such details. Whereas abstract modeling and automated synthesis enables complex system design, such an approach increases the variability of the physical and electrical parameters. In summary, to ensure correct and safe realiza- tions, the system architecture and design style have to be resilient against errors generated by various sources, including: process technology (parameter spreading, defect density, failure rates); 2.1 Technology Trends (cid:12)9 environment (temperature variation, EMI, radiation); (cid:12)9 operation mode (very-low-voltage operation); (cid:12)9 design style (abstraction and synthesis from non-deterministic models). 1.2.4 Variability, ngiseD seigolodohteM dna sCoN Dealing with variability is an important matter affecting many aspects of SoC design. We consider here a few aspects related to on-chip communi- cation design. The first important issue deals with malfunction containment. Tradi- tionally, malfunctions have been avoided by putting stringent rules on physical design and by applying stringent tests on signal integrity before tape out. Rules are such that variations of process parameters can be toler- ated, and integrity analysis can detect potential problems such as crosstalk. This approach is conservative in nature, and leads to perfecting the physi- cal layout of circuits. On the other hand, the downscaling of technologies has unveiled many potential problems and as a result the physical design tools have grown in complexity, cost and time to achieve design closure. At some point, correct-by-construction design at the physical level will no longer be possible. Similarly, the increasingly larger amount of con- nections on chip will make signal integrity analysis unlikely to detect all potential crosstalk errors. Future trends will soften requirements at the physical and electrical level, and require higher-level mechanisms for error correction. Thus, elec- trical errors will be considered inevitable. Nevertheless, their effect can be contained by techniques that correct them at the logic and functional levels. In other words, the error detection/correction paradigm applied to networking will become a standard tool in on-chip communication design. Timing errors are an important side effect of variability. Timing errors can be originated by a wide variety of causes, including but not limited to: incorrect wiring delay estimate, overaggressive clocking, crosstalk and soft (radiation-induced) errors. Timing errors can be detected by double latches, gated by different clocking signals, and by comparing the latched data. When the data differs, it means that most likely the signal settled after the first latch was gated, that is, that a timing error was on the verge of being propagated. (Unfortunately, errors can happen also in the latch themselves.) Asynchronous design methodologies can make the circuit resilient to delay variations. For example, speed-independent and delay-insensitive circuit families can operate correctly in presence of delay variations in gates and interconnects. Unfortunately, design complexity often make the application of an integral asynchronous design methodology imprac- tical. A viable compromise is the use of globally asynchronous yllacol
Description: