Run-time Adaptation for Reconfigurable Embedded Processors wwwwwwwwwwwwww Lars Bauer Jörg Henkel ● Run-time Adaptation for Reconfigurable Embedded Processors Lars Bauer Jörg Henkel Karlsruhe Institute of Technology Karlsruhe Institute of Technology Haid-und-Neu-Str. 7 Haid-und-Neu-Str. 7 76131 Karlsruhe 76131 Karlsruhe Germany Germany [email protected] ISBN 978-1-4419-7411-2 e-ISBN 978-1-4419-7412-9 DOI 10.1007/978-1-4419-7412-9 Springer New York Dordrecht Heidelberg London © Springer Science+Business Media, LLC 2011 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Contents 1 Introduction ............................................................................................... 1 1.1 Application-Specific Instruction Set Processors ................................ 2 1.2 Reconfigurable Processors ................................................................. 3 1.2.1 Summary of Reconfigurable Processors ................................ 4 1.3 Contribution of this Monograph ........................................................ 5 1.4 Monograph Outline ............................................................................ 6 2 Background and Related Work ............................................................... 9 2.1 Extensible Processors ......................................................................... 9 2.2 Reconfigurable Processors ................................................................. 11 2.2.1 Granularity of the Reconfigurable Fabric .............................. 11 2.2.2 Using and Partitioning the Reconfigurable Area ................... 17 2.2.3 Coupling Accelerators and the Processor .............................. 21 2.2.4 Reconfigurable Instruction Set Processors ............................ 23 2.3 Summary of Related Work ................................................................. 26 3 Modular Special Instructions................................................................... 29 3.1 Problems of State-of-the-Art Monolithic Special Instructions .......... 29 3.2 Hierarchical Special Instruction Composition ................................... 34 3.3 Example Special Instructions for the ITU-T H.264 Video Encoder Application ......................................... 41 3.4 Formal Representation and Combination of Modular Special Instructions ......................................................... 49 3.5 Summary of Modular Special Instructions ........................................ 53 4 The RISPP Run-Time System .................................................................. 55 4.1 RISPP Architecture Overview ........................................................... 55 4.1.1 Summary of the RISPP Architecture Overview..................... 58 4.2 Requirement Analysis and Overview ................................................ 58 4.2.1 Summary of the Requirement Analysis and Overview .......... 65 v vi Contents 4.3 Online Monitoring and Special Instruction Forecasting .................... 66 4.3.1 Fine-Tuning the Forecast Values ............................................ 69 4.3.2 Evaluation of Forecast Fine-Tuning ....................................... 73 4.3.3 Hardware Implementation for Fine-Tuning the Forecast Values ...................................... 76 4.3.4 Summary of the Online Monitoring and SI Forecasting ........ 79 4.4 Molecule Selection ............................................................................. 80 4.4.1 Problem Description for Molecule Selection ......................... 81 4.4.2 Parameter Identification for the Profit Function .................... 84 4.4.3 Heuristic Solution for the Molecule Selection ....................... 87 4.4.4 Evaluation and Results for the Molecule Selection ............... 90 4.4.5 Summary of the Molecule Selection ...................................... 96 4.5 Reconfiguration-Sequence Scheduling .............................................. 97 4.5.1 Problem Description for Reconfiguration-Sequence Scheduling .................................. 98 4.5.2 Determining the Molecule Reconfiguration Sequence .......... 101 4.5.3 Evaluation and Results for the Reconfiguration-Sequence Scheduling .................................. 106 4.5.4 Summary of the Reconfiguration-Sequence Scheduling ....... 109 4.6 Atom Replacement ............................................................................. 110 4.6.1 Motivation and Problem Description of State-of-the-Art Replacement Policies .............................. 110 4.6.2 The MinDeg Replacement Policy .......................................... 114 4.6.3 Evaluation and Results ........................................................... 117 4.6.4 Summary of the Atom Replacement ...................................... 122 4.7 Summary of the RISPP Run-Time System ........................................ 123 5 RISPP Architecture Details ...................................................................... 125 5.1 Special Instructions as Interface Between Hardware and Software ....................................................... 126 5.2 Executing Special Instructions Using the Core Instruction Set Architecture ............................................................... 131 5.3 Data Memory Access for Special Instructions ................................... 136 5.4 Atom Infrastructure ............................................................................ 139 5.4.1 Atom Containers and Bus Connectors ................................... 143 5.4.2 Load/Store- and Address Generation Units ........................... 148 5.4.3 Summary of the Atom Infrastructure ..................................... 152 5.5 RISPP Prototype Implementation and Results .................................. 152 5.6 Summary of the RISPP Architecture Details ..................................... 163 6 Benchmarks and Comparisons ................................................................ 165 6.1 Benchmarking the RISPP Approach for Different Architectural Parameters............................................... 166 Contents vii 6.2 Comparing Different Architectures ................................................... 169 6.2.1 Assumptions and Similarities ................................................ 170 6.2.2 Dissimilarities ........................................................................ 171 6.2.3 Fairness of Comparison ......................................................... 172 6.2.4 Summary of Comparing Different Architectures ................... 173 6.3 Comparing RISPP with Application-Specific Instruction Set Processors .................................................................. 174 6.4 Comparing RISPP with Reconfigurable Processors .......................... 182 6.5 Summary of Benchmarks and Comparisons ...................................... 187 7 Conclusion and Outlook ........................................................................... 191 7.1 Summary ............................................................................................ 191 7.2 Future Work ....................................................................................... 193 Appendix A: RISPP Simulation..................................................................... 197 Appendix B: RISPP Prototype ....................................................................... 205 Bibliography .................................................................................................... 211 Index ................................................................................................................. 221 wwwwwwwwwwwwww Abbreviations AC Atom container: a part of the reconfigurable fabric that can be dynami- cally reconfigured to contain an atom, i.e. an elementary data path AGU Address generation unit ALU Arithmetic logic unit ASF Avoid software first: a reconfiguration-sequence scheduling algorithm, as presented in Sect. 4.5 ASIC Application-specific integrated circuit ASIP Application-specific instruction set processor BC Bus connector: connecting an → AC to the atom infrastructure BRAM Block → RAM: an on-chip memory block that is available on Virtex → FPGAs cISA core Instruction Set Architecture: the part of the instruction set that is implemented using the (nonreconfigurable) core pipeline; can be used to implement → SIs as well, as presented in Sect. 5.2 CLB Configurable logic block: part of an → FPGA, contains multiple → LUTs CPU Central processing unit DCT Discrete cosine transformation: a computational kernel that is used in H-264 video encoder EEPROM Electrically erasable programmable read only memory FB Forecast block: indicated by an → FI, containing a set of → SIs with a → FV per SI FI Forecast instruction: a special → HI that indicates an → FB FIFO First-in first-out buffer FPGA Field programmable gate array: a reconfigurable device that is com- posed as an array of → CLBs, → BRAMs, and further components FPS Frames per second ix x Abbreviations FSFR First select, first reconfigure: a reconfiguration-sequence scheduling algorithm, as presented in Sect. 4.5 FSL Fast simplex link: a special communication mechanism for a MicroBlaze processor FSM Finite state machine FV Forecast value: the expected number of → SI executions for the next computational block, part of the information in an → FB GPP General purpose processor GPR General purpose register file GUI Graphical user interface HEF Highest efficiency first: reconfiguration sequence-scheduling algo- rithm, as presented in Sect. 4.5 HI Helper instruction: an assembly instruction that is dedicated to system support (e.g., an → FI); not part of the → cISA and not an → SI HT Hadamard transformation: a computational kernel that is used in H-264 video encoder IP Intellectual property ISA Instruction set architecture ISS Instruction set simulator KB Kilo byte (also KByte): 1,024 byte LSU Load/store unit LUT Look-up table: smallest element in an → FPGA, part of a → CLB; configurable as logic or memory MB Mega byte (also MByte): 1,024 → KB MinDeg Minimum degradation: an atom replacement algorithm, as presented in Sect. 4.6 MUX Multiplexer NOP No operation: an assembly instruction that does not perform any visible calculation, memory access, or register manipulation NP Nondeterministic polynomial: a complexity class that contains all decision problems that can be solved by a nondeterministic Turing machine in polynomial time OS Operating system PCB Printed circuit board PRM Partially reconfigurable module PSM Programmable switching matrix