ebook img

IEEE Transaction on Computers (December 2004) PDF

148 Pages·2004·10.782 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview IEEE Transaction on Computers (December 2004)

IEEE T R A N S A C T I O N S O N IE E COMPUTERS E T R A N S A C T Apublication of the IEEE Computer Society IO N DECEMBER2004 VOLUME 53 NUMBER 12 ITCOB4 (ISSN 0018-9340) S O N C O M P U Editor’s Note T E V.K.Prasanna.................................................................................................................................................................... 1505 R S PAPERS Computer Organizations and Architectures Specification ACombined Approach to High-Level Synthesis for Dynamically Reconfigurable Systems M.Meribout and M.Motomura .......................................................................................................................................... 1508 Digital Devices and Modeling Performance Evaluations AGaussian Noise Generator for Hardware-Based Simulations D.Lee, W.Luk, J.D.Villasenor, P.Y.K.Cheung .................................................................................................................. 1523 Digital Devices, Computer Components, and Interconnection Networks Lower Bounds on the Loading of Multiple Bus Networks for Binary Tree Algorithms H.P.Dharmasena and R.Vaidyanathan ............................................................................................................................ 1535 Designing WDM Optical Interconnects with Full Connectivity by Using Limited Wavelength Conversion Y.Yang and J.Wang.......................................................................................................................................................... 1547 Performance, Fault Tolerance, Reliability, Security, and Testability Susceptibility of Commodity Systems and Software to Memory Soft Errors 1557 A.Messer, P.Bernadat, G.Fu, D.Chen, Z.Dimitrijevic, D.Lie, D.D.Mannaru, A.Riska, and D.Milojicic ........................ Static Test Compaction for Full-Scan Circuits Based on Combinational Test Sets and Nonscan Input Sequences and a Lower Bound on the Number of Tests 1569 I.Pomeranz and S.M.Reddy ............................................................................................................................................ Diagnosability of t-Connected Networks and Product Networks under the Comparison Diagnosis Model 1582 C.-P.Chang, P.-L.Lai, J.J.-M.Tan, and L.-H.Hsu.............................................................................................................. Real-Time Scheduling Task Synchronization in Reservation-Based Real-Time Systems G.Lipari, G.Lamastra, and L.Abeni ................................................................................................................................ 1591 Routing Algorithms and Switching Schemes CoPTUA: Consistent Policy Table Update Algorithm for TCAM without Locking 1602 Z.Wang, H.Che, M.Kumar, and S.K.Das........................................................................................................................ Routing and Broadcasting Algorithms Enhanced Interval Trees for Dynamic IP Router-Tables H.Lu and S.Sahni ............................................................................................................................................................ 1615 V o BRIEF CONTRIBUTIONS l. 5 Topology Control of Ad Hoc Wireless Networks for Energy Efficiency 3 , N M.X.Cheng, M.Cardei, J.Sun, X.Cheng, L.Wang, Y.Xu, and D.-Z.Du.......................................................................... 1629 o . 1 Annual Index.......................................................................................................................................................................... 1636 2 , D e c e m b e r 2 0 0 4 http://www.computer.org [email protected] TLFeBOOK The IEEE Computer Societyis an association of people with professional interest in the field of computers. All members of the IEEE are eligible for membership in the Computer Society, as are members of certain professional societies and other computer professionals. Computer Society members will receive this Transactions upon payment of the annual Society membership fee ($42 for IEEE members, $99 for all others) plus an annual subscription fee (paper only: $41; electronic only: $33; combination: $53). For additional membership and subscription information, visit our Web site at http://computer.org/subscribe, send email to [email protected], or write to IEEE Computer Society, 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA90720-1314 USA. Individual subscription copies of Transactions are for personal use only. IEEETRANSACTIONSON C OMPUTERS EDITOR-IN-CHIEF ASSOCIATEEDITOR-IN-CHIEF VIKTORK. PRASANNA FABRIZIOLOMBARDI Department of EE-Systems, EEB-200 Department of Electrical and Computer Engineering University of Southern California Northeastern University Los Angeles, CA90089-2562 Boston, MA02115 +1 213.740.4483 (cid:127) +1 213.740.4418 (FAX) +1 617.373.4159 (cid:127) + 1 617.373.8970 (FAX) [email protected] [email protected] Editorial Board JOSÉN. AMARAL NEILBURGESS SHARAMLATIFI ARNOLDROSENBERG ANANDTRIPATHI University of Alberta Cardiff University University of Nevada-Las Vegas University of Massachusetts University of Minnesota [email protected] [email protected] [email protected] [email protected] [email protected] MIKHAILATALLAH CHITAR. DAS RANLIBESKIND-HADAS KAREMA. SAKALLAH SHAMBHUJ. UPADHYAYA Purdue University Pennsylvania State University Harvey Mudd College Univ. of Michigan State Univ. of New York Buffalo [email protected] [email protected] [email protected] [email protected] [email protected] NADERBAGHERZADEH FRANKDEHNE JIEN-CHUNGLO MAJIDSARRAFZADEH LONNIER. WELCH University of California, Irvine Griffith University University of Rhode Island UCLA Ohio University [email protected] [email protected] [email protected] [email protected] [email protected] JEAN-CLAUDEBAJARD MICHELDUBOIS WILLIAMMANGIONE-SMITH MIKESCHULTE WANGYI Université Montpellier II University of Southern California UCLA Univ. of Wisconis-Madison Uppsala University [email protected] [email protected] [email protected] [email protected] [email protected] SANJOYBARUAH ANTONIOM. GONZALEZ PANKAJMEHRA ASSAFSCHUSTER Univ. of North Carolina-Chapel Hill Universitat Politecnica de Catalunya Hewlett-Packard Technion, Israel Institute of Technology [email protected] [email protected] [email protected] [email protected] JÜERGENBECKER S.S. IYENGAR CECILIAMETRA LORENSCHWIEBERT Universität Karlsruhe Louisiana State Univeristy DEIS , Universita' di Bologna Wayne State University [email protected] [email protected] [email protected] [email protected] LAXMIN. BHUYAN MICHITAKAKAMEYAMA CSABAANDRASMORITZ DONATELLASCIUTO University of California, Riverside Tohoku Univ. UMASS/ECE Amherst Politecnico di Milano [email protected] [email protected] [email protected] [email protected] BELLABOSE ÇETINK. KOÇ VOJING. OKLOBDZIJA MUKESHSINGHAL Oregon State University Oregon State University Integration Corp. University of Kentuctky [email protected] [email protected] [email protected] [email protected] TODDBRUN SANDIPKUNDU DHANANJAYS. PHATAK University of Southern California Intel Corporation UMBC [email protected] [email protected] [email protected] DHIRAJPRADHAN University of Bristol [email protected] MANUSCRIPT SUBMISSIONS / STATUS INQUIRIES:For information on submitting a manuscript or on a paper awaiting publication, please contact: Transactions Assistant TC, IEEE Computer Society, 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA90720-1314 USA; EMAIL: [email protected], PHONE: +1 714.821.8380; FAX: +1 714.821.4010 IEEECOMPUTERSOCIETY Officers CARLK. CHANG, President MURALIVARANASI, VP, Educational Activities RANGACHARKASTURI, Treasurer GERALDL. ENGEL, President-Elect LOWELLG. JOHNSON, First VP, Electronic Products and Services GENEH. HOFFNAGLE, 2003-2004IEEE Division V Director STEPHENL. DIAMOND, Past President RICHARDA. KEMMERER, SecondVP, Chapter Activities JAMESD. ISAAK,2003-2004 IEEE Division VII Director MICHAELR. WILLIAMS, VP, Publications JAMESW. MOORE,VP, Standards Activities STEVEDIAMOND, 2004IEEE Division V Director-Elect CHRISTINASCHOBER, VP, Conferences & Tutorials YERVANTZORIAN, VP, Technical Activities DAVIDHENNAGE,Executive Director OSCARN. GARCIA, Secretary Publications Board Vice President: Michael R. Williams Members-at-Large Magazines Editors-in-Chief Transactions Editors-in-Chief MIKEBLAHA LINDASHAFER Annals of the History of Computing: DAVIDA. GRIER Computational Biology & Bioinformatics: DANGUSFIELD ANGELABURGESS(ex officio) ANANDTRIPATHI Computing in Science & Engineering: FRANCISSULLIVAN Computers: VIKTORPRASANNA JONROKNE Computer: DORISCARVER Dependable and Secure Computing: RAVISHANKARK. IYER Computer Graphics & Applications: JOHNDILL Information Technology in Biomedicine: NIILOSARANUMMI Magazine Operations Chair: BILLSCHILIT Design & Test: RAJESHGUPTA Knowledge & Data Engineering: PHILIPS. YU Transactions Operations Chair: STEVENTANIMOTO Distributed Systems Online: JEANBACON Mobile Computing: TOMLAPORTA Press Operations Chair: ROGERFUJII Intelligent Systems: NIGELSHADBOLT Multimedia: TSUHANCHEN IEEE PAB Liaison: MICHAELR. WILLIAMS Internet Computing: ROBERTFILMAN NanoBioscience: CARMELINARUGGIERO IT Professional: FRANKFERRANTE Networking: ELLENZEGURA Micro: PRADIPBOSE Parallel & Distributed Systems: PENYEW Multimedia: FOROUZANGOLSHANI Pattern Analysis & Machine Intelligence: RAMACHELLAPPA Pervasive Computing: M. SATYANARAYANAN Software Engineering: JOHNKNIGHT Security & Privacy: GEORGECYBENKO Very Large Scale Integration: N. RANGANATHAN Software: WARRENHARRISON Visualization & Computer Graphics: DAVIDEBERT IEEE CS Press: MICHAELWILLIAMS Executive Staff DAVIDHENNAGE, Executive Director ANNEMARIEKELLY, Assoc. Executive Director ANGELABURGESS, Publisher JOHNC. KEATON, Manager, Research and Planning VIOLETS. DOAN, Director of Administration ROBERTCARE, Director, Information Technology & Services Transactions Department ALICIAL. STICKLEY, Production Manager SUZANNEWERNER, Peer Review Supervisor KATHYSANTAMARIA, Production Editor YU-TZUTSAI,STEVEWAREHAM, Electronic Media Assistants JOYCEARNOLD, Transactions Assistant IEEE TRANSACTIONS ON COMPUTERS is published monthly by the IEEE Computer Society. IEEE Corporate Office:Three Park Avenue, 17th Floor, New York, NY10016-5997 USA. Responsibility for the content rests upon the authors and not upon the IEEE or the IEEE Computer Society. IEEE Computer Society Publications Office:10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA90720-1314 USA. IEEE Computer Society Headquarters:1730 Massachusetts Ave. NW, Washington, DC 20036-1992 USA. Back issues:IEEE members $20.00, nonmembers $110.00 per copy. (Note: Add $4.00 postage and handling charge to any order from $1.00 to $50.00, including prepaid orders). Complete price information available on request. Copyright and Reprint Permissions:Abstracting is permitted with credit to the source. Libraries are permitted to photocopy for private use of patrons, provided the per-copy fee indicated in the code at the bottom of the first page is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA01923 USA. For all other copying, reprint, or republication permission, write to: Copyrights and Permissions Department, IEEE Publications Administration, 445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855-1331. Copyright © 2004 by The Institute of Electrical and Electronic Engineers, Inc. All rights reserved. Periodicals postage paid at New York, NY, and at additional mailing offices. Postmaster:Send address changes to IEEE TRANSACTIONS ON COMPUTERS, IEEE Service Center, 445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855-1331 USA. GST Registration No. 125634188. Canada Post Publications Mail Agreement Number 40013885. Return undeliverable Canadian addresses to: 4960-2 Walker Road, Windsor, ON N9A6J3. Printed in USA. TLFeBOOK IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 12, DECEMBER 2004 1505 Editor’s Note Viktor K.Prasanna ON behalf of the IEEE Computer Society and the Editorial Board of this journal, I wish to thank the following people, who have retired from the board, for their service: Pascale Charpin, Mark Crovella, Kemal Ebciouglu, Matthew Farrens, Laxmi P. Gewali, Anwar Hasan, David Kaeli, Paolo Montuschi, Irith Pomeranz, Karsten Schwan, H.J. Siegel, Anand Sivasubramaniam, Gurindar Sohi, Per Stenstrom, Ioannis Tollis, Peter Varman, Uzi Vishkin, and Scott Wills. It has been an honor and a pleasure working with them. I would like to take this opportunity to thank them for the selfless service they have given these transactions during all these years. It is only with the work of dedicated volunteers such as these that we have been able to continue offering the scientific community the high quality papers, which remain the trademark of the IEEE Transactions on Computers. At the same time, I am pleased to welcome Jean-Claude Bajard, Todd Brun, Frank Dehne, Michel Dubois, Antonio M. González, Cecilia Metra, and Assaf Schuster, who are now joining the Editorial Board. Their biographical sketches highlight their accomplishments and areas of expertise. They are all internationally recognized in their fields and the journal is fortunate indeed to have these outstanding researchers as members of the Editorial Board. Dr. Bajard will help us handle papers in several important areas, including number systems and computer arithmetic. Dr. Brun will help us handle papers in several important areas, including quantum computation and quantum information theory (which include quantum cryptography, quantum error correction, quantum communications, entanglement quantification, and other topics), and simulation of quantum systems. Dr. Dehne will help us handle papers in several important areas, including algorithms and data structures, practical implementations and related issues, parallel algorithms, data warehousing, bioinformatics, and computational geometry. Dr. Dubois will help us handle papers in several important areas, including computer architecture and parallel processing, with a focus on multiprocessor architecture, performance and algorithms. Dr. González will help us handle papers in several important areas, including processor microarchitecture, and code generation and optimization. Dr. Metra will help us handle papers in several important areas, including testing methods and tools, fault tolerance, reliability, security, and testability. Dr. Schuster will help us handle papers in several important areas, including large-scale distributed systems, distributed data mining and model checking, and security aspects of distributed systems. Viktor K. Prasanna Editor-in-Chief ✦ Jean Claude Bajardreceived the MSc degree (1990) and the PhD degree (1993) in computer science from the École Normale Supérieure de Lyon and Claude Bernard University, France. He taught mathematics in high school from 1979 to 1990, then he served as an assistant professor in computer science at the École Normale Supérieure de Lyon from 1990 to 1993, on the team of Jean-Michel Muller. In 1993, he joined the University of Provence, Marseille, France, as associate professor, where he earned, in 1998, the “Habilitation à Diriger des Recherche.” Since 1999, he has been a full professor in computer science at the Institute of Technology (IUT) of the University of Montpellier II. He has been doing his research with the CNRS Laboratory, LIRMM UMR 5506, where he was the head of the “fundamental computer science and applications” department from 2000 to 2003. His research interests are in computer arithmetic, in particular, number representations, elementary functions, multiprecision computing, modular arithmetic, finite fields, operators for cryptography, VLSI algorithms. His main international collaborations are with Milos Ercegovac (UCLA), Graham Jullien (ATIPS), and Peter Kornerup (Odense). He has served on the program committees of the IEEE Symposia on Computer Arithmetic and served as coprogram chair for this symposium in 2003. He is also serving as a coeditor, with Michael Schulte, of a special issue of the IEEE Transactions on Computers (to appear in 2005). He is a member of the IEEE Computer Society. For information on obtaining reprints of this article, please send e-mail to: [email protected]. 0018-9340/04/$20.00 © 2004 IEEE Published by the IEEE Computer Society TLFeBOOK 1506 IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 12, DECEMBER 2004 Todd Brunreceived the AB degree in physics from Harvard University in 1989 and the MS and PhD degrees in physics in 1991 and 1994 from the California Institute of Technology. From 1994 to 2000, he held postdoctoral positions at Queen Mary and Westfield College, London, at the Institute for Theoretical Physics, Santa Barbara, California, and at Carnegie Mellon University, Pittsburgh. From 2000 to 2003, he was a member of the School of Natural Sciences at the Institute for Advanced Study in Princeton. Since 2003, he has been a faculty member in the Department of Electrical Engineering at the University of Southern California, where he is an assistant professor. His research is in quantum computation and quantum information processing. Frank Dehne received the MCS degree (Dipl. Inform.) from the Technical University of Aachen, Germany, in 1983 and the PhD degree (Dr. Rer. Nat.) from the University of Würzburg, Germany, in 1986. In 1986, he joined the School of Computer Science at Carleton University in Ottawa, Canada, as an assistant professor. He was appointed an associate professor and professor of computer science in 1990 and 1997, respectively. From 2000 to 2003, he served as the director of the School of Computer Science. In 2004, he was appointed a professor of information technology at Griffith University in Brisbane, Australia. His current research interests are in the general area of algorithms and practical implementations, in particular parallel computing, coarse-grained parallel algorithms, computational geometry, parallel data warehousing and OLAP, and parallel bioinformatics. He is particularly interested in the interrelationship between the theoretical analysis of algorithms and the performance observed when these algorithms are implemented. He is a senior member of the IEEE, vice-chair of the IEEE Technical Committee on Parallel Processing, and a member of the Steering Committee of the ACM Symposium on Parallel Algorithms and Architectures. He is a founding co-investigator of HPCVL(www.hpcvl.org), a large regional parallel computing center with more than $40M funding from government and industry. He is also an editor of the journals Information Processing Letters,Parallel Algorithms and Applications, and International Journal of Data Warehousing and Mining, as well as a cofounder of the Workshop on Algorithms and Data Structures (www.wads.org) and the International Workshop on Parameterized and Exact Computation (www.iwpec.org). He was awarded two Carleton University Research Achievement Awards in 1992 and 1998, respectively, each awarding him a one year release from all teaching duties for the purpose of pursuing his research interests. Michel Dubois received the PhD degree from Purdue University, the MS degree from the University of Minnesota, and an engineering degree from the Faculte Polytechnique de Mons in Belgium, all in electrical engineering. He is a professor in the Department of Electrical Engineering at the University of Southern California (USC). Before joining USC in 1984, he was a research engineer at the Central Research Laboratory of Thomson-CSF in Orsay, France. His main interests are computer architecture and parallel processing, with a focus on multiprocessor architecture, performance, and algorithms. He has published more than 120 papers in technical journals and leading conferences on these topics. He is a member of the ACM and a fellow of the IEEE. Antonio M. González received the MS and PhD degrees from the Universitat Politècnica de Catalunya (UPC), Barcelona, Spain. He has been a faculty member of the Computer Architecture Department at UPC since 1986 and he is currently a professor at this department. He leads the Intel- UPC Barcelona Research Center, whose research focuses on new microarchitecture paradigms and code generation techniques for future microprocessors. His research has focused on computer architecture, compilers, and parallel processing, with a special emphasis on processor microarchitecture and code generation. He has published more than 180 papers and filed seven patents in the areas of power-aware microarchitectures, clustered microarchitectures, speculative multithreaded processors, data value and data dependence speculation and reuse, cache architectures, register file architecture, modulo scheduling, code analysis and optimization, mapping parallel algorithms to multicomputers, prolog-oriented architectures, instruction fetching mechanisms, and digital image processing. He is an associate editor of the IEEE Transactions on Parallel and Distributed Systems,ACM Transactions on Architecture and Code Optimization, and Journal of Embedded Computing. He has served on more than 50 program committees for international symposia in the field of computer architecture, including ISCA, MICRO, HPCA, PACT, ICS, ICCD, ISPASS, CASES and IPDPS. He was program (co)chair for ICS 2003, ISPASS 2003, and MICRO 2004, among other symposia. TLFeBOOK IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 12, DECEMBER 2004 1507 Cecilia Metra received the degree (summa cum laude) in electronic engineering and the PhD degree in electronic engineering and computer science from the University of Bologna, Italy. Since 2000, she has been an assistant professor in electronics at the University of Bologna, Italy, and she qualified as an associate professor in electronics in 2003. From 1998 to 2001, she was a visiting scholar at the University of Washington, Seattle, while, in 2002, she was a visiting faculty consultant for Intel, Santa Clara, California. She has served as general cochair, program cochair, topic chair, and technical program committee member of several international conferences, symposia, and workshops, as well as as a guest coeditor of special issues of several international journals and magazines. Her research interests are in the field of design and test of digital systems, reliable and error resilient systems, fault tolerance, online testing, fault modeling, and concurrent diagnosis. Assaf Schusterreceived the BA, MA, and PhD degrees in mathematics and computer science from the Hebrew University of Jerusalem, the latter one in 1991. He is an associate professor in the Computer Science Department at the Technion-Israel Institute of Technology. His research interests and fields of publication include memory hierarchies and consistency models, distributed shared memory, parallel and distributed computing, scalable model checking, large-scale Grid systems, locality in large-scale systems, scalable distributed data mining, and privacy preserving in large- scale systems. He established and serves as the head of the Distributed Systems Laboratory at the Technion (http://dsl.cs.technion.ac.il), serves as an associate editor of the Journal of Parallel and Distributed Computing, is leading several large efforts in his area, collaborates with European and American projects, has served as a consultant to several leading hi-tech companies such as IBM and Hewlett-Packard, and is listed on the advisory board of several startups in his fields of expertise. TLFeBOOK 1508 IEEETRANSACTIONSONCOMPUTERS, VOL.53, NO.12, DECEMBER2004 A Combined Approach to High-Level Synthesis for Dynamically Reconfigurable Systems Mahmoud Meribout and Masato Motomura Abstract—Theincreaseincomplexityofprogrammablehardwareplatformsresultsintheneedtodevelopefficienthigh-level synthesistoolssincethatallowsmoreefficientexplorationofthedesignspacewhilepredictingtheeffectsoftechnologyspecifictools onthedesignspace.Muchofthepreviouswork,however,neglectsthedelayofinterconnects(e.g.multiplexers)whichcanheavily influencetheoverallperformanceofthedesign.Inaddition,inthecaseofdynamicreconfigurablelogiccircuits,unlessanappropriate designmethodologyisfollowed,anunnecessarilylargenumberofconfigurablelogicblocksmayendupbeingusedforcommunication betweencontexts,ratherthanforimplementingfunctionunits.Theaimofthispaperistopresentanewtechniquetoperform interconnect-sensitivesynthesis,targetingdynamicreconfigurablecircuits.Further,theproposedtechniqueexploitsmultiplehardware contextstoachieveefficientdesigns.Experimentalresultsonseveralbenchmarks,whichhavebeendoneonourDRLLSIcircuit[10], [12],demonstratethat,byjointlyoptimizingtheinterconnect,communication,andfunctionunitcost,wecanachievehigherquality designsthanispossiblewithsuchprevioustechniquesasForce-DirectedScheduling. IndexTerms—Dynamicreconfigurablelogic,scheduling,allocation,partitioning,communicationcost. (cid:1) 1 INTRODUCTION FIELD Programmable Gate arrays (FPGAs) are the most 2 ARCHITECTURAL MODEL AND PROBLEM common devices for reconfigurable computing. How- DEFINITION ever,dynamicreconfigurationhasemergedasanattractive 2.1 Basic Idea technique for minimizing the reconfiguration time [1]. An Traditionally, SRAM-based FPGA are reprogrammed once example of dynamic reconfiguration is the multicontext during the circuit start-up cycle. In contrast, dynamically architecture,whichmaystoreinitsinternalmemoryasetof programmable gate arrays allow for “on-the-fly” reconfi- different configurations or contexts. High-Level Synthesis gurationofthechip.In[10]and[12],anewLSIarchitecture (HLS) is becoming the methodology of choice for short- of DRL circuits was proposed. The architecture is com- ening the design time of Dynamic Reconfigurable Logic prisedofeightidenticalcontexts.Eachcontextcanstorethe (DRL) circuits by allowing the user to start from a configuration of 768 Configurable Logic Blocks (CLBs) that behavioral specification. However, most of the previous are embedded in a configurable interconnect structure and HLS techniques targeting DRL circuits do not consider the surrounded by configurable I/O blocks (Fig. 1a). When a effectofinterconnect(e.g.,multiplexers)whichmayheavily affecttheoverallperformanceofthedesign.Table1,which new configuration of the CLB is needed, it is downloaded showstheareaanddelayfordifferentFunctionUnits(FUs) from the appropriate context in just a few nanoseconds. and multiplexers of our target DRL circuit [10], [12], Another feature of our DRL circuit is the set of eight indicates that neglecting interconnection costs may induce registers at the outputs of the CLBs; these registers are severalsynthesisiterationsandyieldimplementationswith accessible by, and facilitate communication between, poor quality. differentcontexts(Fig.1b).ComparedtotraditionalFPGAs, In this paper, we propose a new design flow methodol- the DRL architecture requires fewer hardware resources to ogy that takes into consideration most of the architectural implement the same number of gates. Only the gates features of DRL circuits. The same approach can be easily needed for implementing the actual state of the design are mappedintoCLBs,whereastheconfigurationsoftheother extended to FPGAs. The next section will describe the remaining gates are stored in different contexts and architectural modelofourtargetDRLcircuit[10],[12].The mapped only when needed. On the other hand, the main new, dedicated HLS technique is then proposed. The last disadvantagesofDRLcircuitsarethedifficultyinfindinga section addresses some experimental results. unified architecture, suitable for all types of applications (e.g., centralized versus distributed controllers for switch- . M.MeriboutiswiththeInformationEngineeringDepartment,Collegeof ing between contexts, number of contexts, and number of Engineering,SQUUniversity,SultanateofOman. E-mail:[email protected]. CLBs), and the high hardware and development costs. . M. Motomura is with the System ULSI Research Laboratory, Silicon However, we believe that the added flexibility of the SystemsResearchLaboratories,NECCorp.,Tokyo,Japan. DRL architecture outweighs these disadvantages and Manuscriptreceived8Jan.2002;revised29Apr.2003;accepted27Feb.2004. makes it an attractive choice for implementing next For information on obtaining reprints of this article, please send e-mail to: [email protected],andreferenceIEEECSLogNumber115644. generation hardware platforms. 0018-9340/04/$20.00(cid:1)2004IEEE PublishedbytheIEEEComputerSociety TLFeBOOK MERIBOUTANDMOTOMURA: ACOMBINEDAPPROACHTOHIGH-LEVELSYNTHESISFORDYNAMICALLYRECONFIGURABLESYSTEMS 1509 chip-level interconnection along the rows and TABLE 1 AreaandDelayFiguresforAdders,Multipliers,andMultiplexers columns, or the Bus Connector (BC). Therefore, the forourDRLLSI Circuit [10], [12] elementary blocks connected with GBS have longer propagation delays. This is why we introduced a second hierarchy, namely, the macro block (Fig. 3). Basedonthecharacteristicsofsomemultimediaand networking benchmarks, we have the numbers of CLBs in the elementary and macro blocks to 16 and 64, respectively. 2.3 Cost Models and Challenges for a Dedicated HLS Tool Ahigh-levelsynthesistoolmustconsiderboththeareaand communication costs of candidate implementations. The area cost must account for the number of required FUs as well as for any necessary multiplexers. This is particularly 2.2 Hardware Model important for DRL circuits in which, unlike Application Specific Integrated Circuits (ASICs), the area of a multi- 1. DRL Hardware model of contexts: The DRL model plexeriscomparabletothatofafunctionunitwiththesame canbedecomposedintotwoparts(Fig.2):afiniteset bitwidth(Table1).TraditionalASICCADtools,thus,tend ofdatapathsfDP1;DP2...DPmgandacorrespond- to introduce a large number of multiplexers in order to ing set of memory elements, fME1;ME2...MEmg, maximize the sharing of FUs. Besides directly accounting which can be used for communication between the for the area cost of multiplexers, we also account for their logicblocks. Each data path, DPi, and its associated propagation delays; in our results, we assume that CLB memoryarereferredtoascontexti.Asignalcreated delay is tCLB ¼2:3ns and CLB to register delay is in some context i can be stored and then read back tclb!reg ¼2:75ns. Another important difference between by any other context j. Additionally, each datapath our approach and more traditional approaches is in the DPi consists of a two-dimensional array of logic waythat the number and type (e.g., adder, multiplier, etc.) blocks fðLBxðiÞLByðiÞÞg. Here, two logic blocks of FUs is determined. Most traditional algorithms make an fLBxðiÞLByðiÞgandfðLBx0ðjÞ;LBy0ðjÞgbelongingto initial assumption on the number of one type of FU and two different contexts i and j are said to be thentrytofindtheoptimalnumberofothertypesofFUs.In superposed if they satisfy the condition x¼x0 and ourmethod,aminimalnumberofFUsandmultiplexersof y¼y0. each type can be automatically found. 2. Hierarchical representation of the datapath: The The reconfigurabilty of the DRL architecture requires datapath of each context can be modeled as having that a synthesis system also consider the cost of commu- two hierarchical blocks (Fig. 1). The first type is the nicating between different contexts. Consider the example elementary block, which corresponds to a cluster of ofFig.4awhichdepictsaControlDataFlowGraph(CDFG) CLBs interconnected with high-speed local buses inwhichnodesAandBareassignedto context1,nodesC (Fig. 3). In our DRL model, a distributed context andDtocontext2,nodesEandFtocontext3,andnodeG switch can be performed since each elementary to context 4, and the contexts are executed in numerical block has its own context switch controller. These order. This schedule requires three LBs to store the signals elementary blocks can be connected either with the produced in some context for later use by a nonadjacent Global Bus Switch (GBS) (Fig. 3), which provides (i.e.,nonconsecutive)context.ThescheduleinFig.4bswaps Fig.1.DatapatharchitecturalmodelofourDRLcircuit. TLFeBOOK 1510 IEEETRANSACTIONSONCOMPUTERS, VOL.53, NO.12, DECEMBER2004 Fig.2.HardwaremodelofourDRLcircuit. the context assignments of nodes D and F and reduces the Fig.4.EffectofbufferingcostforDRLcircuits. number of required LBs to just two. Thus, while a conventional synthesis system would view the two schedules in Fig. 4 as equivalent since both are completed 3 PREVIOUS WORK in four time steps, the need to store (buffer) signals until they’re no longer needed identifies the schedule in Fig. 4b In the last few years, several Computer-Aided Design as the better solution. For the DRL circuits, thus, an (CAD) approaches that generate hardware for dynamic important component of the communication cost between reconfigurablecircuitshavebeendescribedintheliterature. These include JHDL [20], PamDC [21], SPYDER [22], and contexts is the storage neededfor storing signals produced Transmogriffer[14].AlltheseCADsoftwaretoolsuseeither bysomecontextandusedbylatercontexts.Asecondfactor JavaorC++asthedesignentryprogramminglanguageand for communication cost is the number of variables carried mainly focus on how to partition combinational circuits on bythegiveninputportofanFU.Specifically,aninputport adjacent communication DRL circuits without taking into ofanFUmaycarryadifferentvariableineachcontrolstep. considerationtheirvariouscosts(asdescribedinSection2). These variablescanbeeither storedin differentcontexts of In [15], a technique for handling state machines by assign- the same CLBs or connected to a multiplexer. The ing a context to each state was proposed. However, this advantage of the first alternative is that the multiplexer techniqueiseffectiveonlyifthenumberofstatesiscloseto delay is omitted. However, because of the limited number the number of contexts and when the states are densely of contexts, the number of variables to be stored in such a encoded. Thus, this algorithm may not be effective in the case of sequential circuits with many flips-flops (FFs). In way islimited. Theother communicationcost thatmust be [16], a force-directed scheduling (FDS) method, which considered, then, is the number of registers required to considersthecostofbufferingsignalsbetweennonadjacent store these intermediate variables. In this paper, we time frames, was successfully used for assigning several consider the communication cost, TCM, to be the number controlstepstoonesinglecontext.Thismethodproceedsby of LBs used for storing intermediate results between allocating each node of the Control Data Flow Graph nonadjacent contexts or to store intermediate variables in (CDFG)[14]toaspecificFU.However,thesolutionmaynot different contexts of the same CLB. Additionally, we be optimal since the algorithm uses as a basic criterion the introduce the communication to computation ratio, typeofoperationperformedbyeachnode(i.e.,FU),butnot R¼TCM=TLUT, where TLUT is the number of LUTs within thepropertiesofitsneighboringnodes(i.e.,theirtypesand a context. interconnectiontopology).Thismayleadtoalargenumber of multiplexers and registers. Additionally, the algorithm constrains the output of n chained nodes to be available n clock cycles at the earliest. This leads to suboptimal performance, especially in the case of submicron technol- ogy,wherethedelayofCLBsisarelativelysmallportionof the system clock. Recently, work which considers the properties of neighboring nodes has been reported [2], [3]. However,it focusesmore on template matching (similar to the graph covering problem) than on template generation. Regularity extraction was shown tp be beneficial in reducing the area and increasing the performance for the PipeRench architecture [8]. Theauthors goon to show that the templates lead to a decrease in area and delay for the PipeRench architecture. However, the template generation was restricted to a single output template and a limited numberofinputs.Inaddition,theirstudydoesnotaddress Fig.3.Exampleofgroupingelementaryblocksintomacroblocks. the architectural features of the DRL circuits. Trees and TLFeBOOK MERIBOUTANDMOTOMURA: ACOMBINEDAPPROACHTOHIGH-LEVELSYNTHESISFORDYNAMICALLYRECONFIGURABLESYSTEMS 1511 Fig.5.Overallarchitectureoftheproposedhigh-leveldesignflow. single output templates are used by Chowdhary et al. [18] the fact that multiplexer-based architectures are much tocoverdatapathcircuits.RaoandKurdahi[11]addressed harder to handle due to the inherently more complex template generation for system-level clustering using the optimizations aimed at reducing the number and size of well-known first fit approach to bin filling. More recently, multiplexers (i.e., switching the inputs of commutative Cadambi and Goldstein [9] proposed single-output tem- operations). In conclusion, unless an accurate estimation of plate generation via a constructive, bottom-up approach. the area and delay of templates is available, a lengthy Both methods restrict the area and the number of pins for synthesisprocedure,causedbyalargenumberofiterations their templates. Our method attempts to find the best between the high and low levels, is inevitable. In our case, possible set of templates, regardless of area andnumber of the area and delay of both the FUs and multiplexers are inputs, though we can easily add pin and area restrictions considered throughout all the steps of the design flow. to our algorithms. Additionally, we perform template generationandmatchingsimultaneously.IMEC’sCathedral 4 OVERVIEW OF OUR APPROACH AND MOTIVATION Project [6] used a different model of computation in their EXAMPLES high-level synthesis stage: Instead of a CDFG, they performed reductions on the signal flow graph of a DSP This paper provides a new HLS technique for Dynamic application. Their data path was composed of Abstract ReconfigurableCircuits.TheinputtoourtoolistheControl Building Blocks (ABBs), or instructions, available from a Data Flow Graph (CDFG) of the design. In the CDFG, it is given hardware library. This was achieved via manual assumedthatthefront-endcompilerassignsalltheprimary clustering of necessary operations into more compact inputs and outputs of the design to different I/O ports of operations. Their results demonstrated an expected reduc- theDRLcircuitinawaythatensuresmaximalconcurrency tion of critical path length as well as interconnect. On the ofthedesign[10].TheCDFGisthenprocessedbytheback- other hand, our template generation and matching algo- end part in order to generate the hardware application rithms are automated. Additionally, the templates are not netlist file (see Fig. 5). The module library in this system supposedtonecessarilymatchoneofthecomponentsofthe provides a set of different kinds of parameterizable FUs hardware library, but are constructed according to some characterized by their area and delay. The first task of the criteriawhichonlydependonthenatureoftheinputCDFG backend part is to partition the CDFG into independent (Section 4.2). This flexibility allows the threads to contain clustersofconnectednodes,namely,threads,whichfeature largernumbersofFUs.Additionally,theirproposalneither someproperties,tobeexplainedinSection4.2.Scheduling, considers the case of several ports for multimemory access allocation, and partitioning tasks are performed on these nor the way to share FUs when there is not enough area threadsatacoarselevel(asetofconnectedFUsratherthan withtheactualcomponentsofthelibrary.Someotherhigh- a single FU). This leads to a reduction of HLS complexity, level synthesis algorithms that consider the effects of and further delivers high throughput and low area and physical implementation in high-level synthesis were communication costs to the system. In addition, the addressed in [7], [17]. Pangrle et al. in [7] use a compiler thathastheabilitytoincorporatevariouslow-levelphysical subgraphs (threads in our case) are not manually selected effectsintothesynthesisprocess.Itusesaniterativescheme and are specific to DRL circuits. The other merit of our in which a behavioral synthesis tool incorporates inter- technique concerns the design flow used to process these connection delay based on accurate estimates of the final subgraphsandtosharetheirFUs(Section7).Whereasother circuit. In [17], FU and interconnect cost models are used techniques use multiplexers for FU sharing, our solution is withinahigh-levelsynthesisflow.However,thisapproach to implement similarpatterns of apair of threadsonto one seemstohavebeenonlylimitedtobus-basedarchitectures, singlecontextandtomaptheremainingdissimilarpatterns with no results for multiplexer-based architectures, which in superposed logic blocks. This has the advantage of aremoresuitableforDRLcircuits.Thedifficultyarisesfrom improvingtheperformanceofthesystemsincemultiplexer TLFeBOOK 1512 IEEETRANSACTIONSONCOMPUTERS, VOL.53, NO.12, DECEMBER2004 Fig.7.BehavioraldescriptionandcorrespondingCDFG. I/O port of the DRL circuit (Condition 2). That is, if twodifferentprimaryinputs,p andp ,areassigned i1 i2 to the same I/O port of the DRL device, then all the operationsrelatedtop andp can’tbeassignedtothe i1 i2 same thread. 1.3. A thread can also be a state of a state machine, explicitlydefinedby theuser,andwhosenodes satisfy conditions1and2(Condition3).Thistypicallyarises when a specific segment of a system-level algorithm needstobeimplementedasastatemachineandwhere a jump from one state to another is done according to Fig.6.Flow-chartoftheback-endcompiler. someclockdrivenconditions(e.g.,timestamp).Unless Conditions1and2aresatisfied,thesetofoperationsof delays are omitted. In addition, the communication cost each state can be considered as a thread. TCM is reduced since a large number of FUs are shared 1.4. A fork node, which corresponds toa loop condition, is using fewer superposed registers. assigned to an individual thread when all of its 4.1 The Backend Compiler operations satisfy Conditions 1 and 2 (Condition 4). 1.5. We define (cid:1) as the set containing the list of threads, Thebackendcompileriscomposedofatemporalpartitioning : (cid:1)¼f ;k2½1;N (cid:1)g, where N is the total stageandaspatial partitioningstage(Fig.6).Thetemporal k k (cid:1) (cid:1) number of threads. In addition, each thread, , is partitioning stage provides the highest performance while k composed of a set of operations denoted by o ð Þ: meeting the area constraint criteria. The threads obtained m k ¼fo ð Þ;m2½1;N (cid:1)g, where N is the after the first partitioning are gradually merged during k m k o;k o;k number of operations of thread . allocation, scheduling, and allocation-scheduling until the k areaconstraintismet.Duringthespatialpartitioningphase, From these five conditions, we conclude that a thread is the threads are mapped into different contexts while composed of a chain of connected nodes whose real-time reducing the communication cost. In what follows, we delaycanbecalculatedasthesumofdelaysencounteredby illustrate our approach with the walk-through example of each FU belonging to the longest (i.e., critical) path. theSecondOrderDifferentialEquation(SODE) Additionally, the threads can be fairly small since memory or I/O accesses occur frequently in the programs or fairly yþ5xy0þ3y¼0 ð1Þ large(asillustratedbytheEWFandCAST-128benchmarks [4]. Its C-based description and DFG are shown in Fig. 7. inSection9)sincetheDRLdevicecontainsenoughIOports to group several I/O operations into the same thread. This 4.2 Basic Idea has the advantage of reducing the time complexity of the A thread is the basic element for our HLS tool and is high-level stage of the HLS algorithm by performing extracted according to the following five conditions: optimizations on threads rather than on single operations. Definition 1. 5 THE ALGORITHM 1.1. A thread is defined as a bloc of connected nodes, 5.1 Algorithm of Partitioning a CDFG into Threads leading to the same output signal (Condition 1). 1.2. Two nodes of a thread can’t simultaneously have Using the above five definitions, the thread generation different primary inputs and be allocated to the same algorithm iteratively clusters nodes into threads. The TLFeBOOK

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.