ebook img

Heap Data Allocation To Scratch-Pad Memory In Embedded Systems Angel Dominguez Doctor of ... PDF

324 Pages·2007·3.04 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Heap Data Allocation To Scratch-Pad Memory In Embedded Systems Angel Dominguez Doctor of ...

ABSTRACT Title of dissertation: Heap Data Allocation To Scratch-Pad Memory In Embedded Systems Angel Dominguez Doctor of Philosophy, 2007 Dissertation directed by: Professor Rajeev K. Barua Department of Electrical and Computer Engineering This thesis presents the first-ever compile-time methodfor allocating a portion of a program’sdynamic datato scratch-pad memory. A scratch-pad isa fast directly addressed compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs cache and by its signifi- cantlylower overheads inaccesstime, energyconsumption, areaandoverallruntime. Dynamic data refers to all objects allocated at run-time in a program, as opposed to static data objects which are allocated at compile-time. Existing compiler methods for allocating data to scratch-pad are able to place only code, global and stack data (static data) in scratch-pad memory; heap and recursive-function objects(dynamic data) are allocated entirely in DRAM, resulting in poor performance for these dy- namic data types. Runtime methods based on software caching can place data in scratch-pad, but because of their high overheads from software address translation, they have not been successful, especially for dynamic data. In this thesis we present a dynamic yet compiler-directed allocation method for dynamic data that for the first time, (i) is able to place a portion of the dynamic data in scratch-pad; (ii) has no software-caching tags; (iii) requires no run-time per- access extra address translation; and (iv) is able to move dynamic data back and forth between scratch-pad and DRAM to better track the program’s locality char- acteristics. With our method, code, global, stack and heap variables can share the same scratch-pad. When compared to placing all dynamic data variables in DRAM and only static data in scratch-pad, our results show that our method reduces the average runtime of our benchmarks by 22.3%, and the average power consumption by 26.7%, for the same size of scratch-pad fixed at 5% of total data size. Significant savings in runtime and energy across a large number of benchmarks were also ob- served when compared against cache memory organizations, showing our method’s success under constrained SRAM sizes when dealing with dynamic data. Lastly, our method is able to minimize the profile dependence issues which plague all similar allocation methods through careful analysis of static and dynamic profile informa- tion. Heap Data Allocation To Scratch-Pad Memory In Embedded Systems by Angel Dominguez Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2007 Advisory Committee: Professor Rajeev K. Barua, Chair/Advisor Professor Manoj Franklin Professor Shuvra S. Bhattacharrya Professor Peter Petrov Professor Chau-Wen Tseng (cid:13)c Copyright by Angel Dominguez 2007 Dedication Para mis padres, Angel y Belinda. Gracias por todo sus ayuda, amor y paciencia durante mi carrera, sin la cual nunca lo hubiera logrado! ii Acknowledgements There are a few people I would like to thank who were instrumental to the completion of my doctoral dissertation and academic career . First and foremost among them are my parents, Angel and Belinda Dominguez, two of the most loving and hard working people I have ever known. I can honestly say that I have been truly blessed by having the tremendous luck to be born to two such wonderful and caring people. Both my father and my mother have been completely supportive of my academic pursuits, even through trying and difficult times. Without them, I would not be the man I am today, and I can never truly thank them enough for all they have done for our family. There have been several teachers throughout my life that have instilled in me a love of learning and a thirst for new knowledge that have driven me to complete a doctoral degree. My first introduction to adult endeavors in science was given to me by James Thompson, my science teacher at Archbishop Curley Notre Dame high school for several years. He inspired hundreds of future scientists with his unique balance of wisdom and humor while making science entertaining and interesting to young people. Donald Yeung filled a similar role in my graduate studies and I was lucky enough to have him as my academic and research advisor for my Master’s degree. He is another truly dedicated individual with unquestionable morals and possesses a perfectionist approach to research that I truly admire. Lastly, this dissertation would not have been possible without my doctoral research advisor, Rajeev Barua, an intelligent man with a great deal of enthusiasm for new research directions. His proficiency in writing, discussing problems and people skills have greatly helped my development in those areas. Finally I would like to thank all my friends and family scattered across the world. Without you, there would be no joy in sharing what I have learned or accomplished in my life. I appreciate your patience and efforts while I have been busy with work and academics, and hope to reconnect with as many of you as possible while meeting new friends wherever I go. Leilei in particular has been with me for the entirety of my dissertation and has been loving and supporting for the many years I have spent in graduate school. As I complete this chapter of my life, she is beginning her own graduate studies and I wish her the best of luck as she embarks on her new career. iii iv Contents 1 Introduction 1 1.1 Organization of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Embedded systems and software development 15 2.1 Embedded systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Intel StrongARM microprocessor . . . . . . . . . . . . . . . . . . . . 24 2.3 Embedded software development . . . . . . . . . . . . . . . . . . . . 27 2.4 C language compilers . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5 Heap data allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.6 Recursive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3 Previous work on SPM allocation 49 3.1 Overview of related research . . . . . . . . . . . . . . . . . . . . . . . 49 3.2 Static SPM allocation methods . . . . . . . . . . . . . . . . . . . . . 51 3.3 Dynamic SPM allocation techniques . . . . . . . . . . . . . . . . . . . 56 3.4 Existing methods for dynamic program data . . . . . . . . . . . . . . 60 3.5 Heap-to-stack conversion techniques . . . . . . . . . . . . . . . . . . . 63 3.6 Memory hierarchy research . . . . . . . . . . . . . . . . . . . . . . . . 65 3.7 Dynamic memory manager research . . . . . . . . . . . . . . . . . . . 71 3.8 Other related methods . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4 Dynamic allocation of static program data 77 4.1 Overview for static program allocation . . . . . . . . . . . . . . . . . 78 4.2 The Dynamic Program Region Graph . . . . . . . . . . . . . . . . . . 83 4.3 Allocation method for code, stack and global objects . . . . . . . . . 88 4.4 Algorithm modifications . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.5 Layout and code generation . . . . . . . . . . . . . . . . . . . . . . . 106 4.6 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5 Dynamic program data 113 5.1 Understanding dynamic data in software . . . . . . . . . . . . . . . . 114 5.2 Obstacles to optimizing software with dynamic data . . . . . . . . . . 125 5.3 Creating the DPRG with dynamic data . . . . . . . . . . . . . . . . . 130 6 Compiler allocation of dynamic data 139 6.1 Overview of SPM allocation for dynamic data . . . . . . . . . . . . . 141 6.2 Preparing the DPRG for allocation . . . . . . . . . . . . . . . . . . . 145 6.3 Calculating heap bin allocation sizes . . . . . . . . . . . . . . . . . . 146 6.4 Overview of the iterative portion . . . . . . . . . . . . . . . . . . . . 149 v 6.5 Transfer minimizations . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.6 Heap safety transformations . . . . . . . . . . . . . . . . . . . . . . . 153 6.7 Memory layout technique for address assignment . . . . . . . . . . . . 158 6.8 Feedback driven transformations . . . . . . . . . . . . . . . . . . . . . 162 6.9 Termination of iterative steps . . . . . . . . . . . . . . . . . . . . . . 164 6.10 Code generation for optimized binaries . . . . . . . . . . . . . . . . . 166 7 Robust dynamic data handling 169 7.1 General optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . 170 7.2 Recursive function stack handling . . . . . . . . . . . . . . . . . . . . 175 7.3 Compile-time unknown-size heap objects . . . . . . . . . . . . . . . . 181 7.4 Profile sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 8 Methodology 205 8.1 Target hardware platform . . . . . . . . . . . . . . . . . . . . . . . . 207 8.2 Software platform requirements . . . . . . . . . . . . . . . . . . . . . 210 8.3 Compiler implementation . . . . . . . . . . . . . . . . . . . . . . . . . 215 8.4 Simulation platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 8.5 Benchmark overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 8.6 Benchmark classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 8.7 Benchmark suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 9 Results 235 9.1 Dynamic heap allocation results . . . . . . . . . . . . . . . . . . . . . 236 9.1.1 Runtime and energy gain . . . . . . . . . . . . . . . . . . . . . 236 9.1.2 Transfer method comparison . . . . . . . . . . . . . . . . . . . 241 9.1.3 Reduction in heap DRAM accesses . . . . . . . . . . . . . . . 244 9.1.4 Effect of DRAM latency . . . . . . . . . . . . . . . . . . . . . 246 9.1.5 Effect of varying SPM size . . . . . . . . . . . . . . . . . . . . 246 9.2 Unknown-size heap allocation . . . . . . . . . . . . . . . . . . . . . . 247 9.3 Recursive function allocation . . . . . . . . . . . . . . . . . . . . . . . 251 9.4 Comparison with caches . . . . . . . . . . . . . . . . . . . . . . . . . 257 9.5 Profile sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 9.5.1 Profile input variation . . . . . . . . . . . . . . . . . . . . . . 268 9.6 Code allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 10 Conclusion 279 A Additional results 281 A.1 Detailed recursive stack allocation results . . . . . . . . . . . . . . . . 281 A.2 Cache comparison results . . . . . . . . . . . . . . . . . . . . . . . . . 283 A.3 Profile sensitivity results . . . . . . . . . . . . . . . . . . . . . . . . . 288 vi List of Figures 1.1 Example heap allocation using our method . . . . . . . . . . . . . . . 8 2.1 Diagram of a typical desktop computer. . . . . . . . . . . . . . . . . . 17 2.2 Diagram of a typical embedded computer. . . . . . . . . . . . . . . . 19 2.3 Memory types common to embedded platforms . . . . . . . . . . . . 20 2.4 Comparison between popular embedded memory types. . . . . . . . . 24 2.5 Diagram of the Intel StrongARM embedded CPU. . . . . . . . . . . . 25 2.6 Compilation of an application from source files. . . . . . . . . . . . . 37 2.7 Compiler view of program memory . . . . . . . . . . . . . . . . . . . 41 2.8 Sample memory layout for an embedded application . . . . . . . . . . 43 2.9 Heap manager example . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.10 Stack growth of a recursive function. . . . . . . . . . . . . . . . . . . 48 4.1 DPRG created for a sample program. . . . . . . . . . . . . . . . . . . 85 4.2 Algorithm for dynamic allocation of static program data. . . . . . . . 91 4.3 DPRG enhanced with code regions. . . . . . . . . . . . . . . . . . . . 96 5.1 Memory map for a typical ARM application. . . . . . . . . . . . . . . 117 5.2 Example of a recursive data structure. . . . . . . . . . . . . . . . . . 124 5.3 Example program fragment . . . . . . . . . . . . . . . . . . . . . . . 133 5.4 DPRG showing a heap allocation site. . . . . . . . . . . . . . . . . . . 134 5.5 DPRG for a sample function with heap data. . . . . . . . . . . . . . . 135 6.1 Algorithm for dynamic allocation of heap data. . . . . . . . . . . . . 142 6.2 Calculating heap bin sizes for allocation. . . . . . . . . . . . . . . . . 149 6.3 Allocation scenario for an example program . . . . . . . . . . . . . . 159 7.1 DPRG of a recursive function. . . . . . . . . . . . . . . . . . . . . . . 177 7.2 Binary tree showing access frequency. . . . . . . . . . . . . . . . . . . 180 7.3 Sample program containing unknown-size heap allocation . . . . . . . 189 7.4 Example function containing unknown-size heap allocation . . . . . . 201 7.5 DPRG for a region with access frequencies . . . . . . . . . . . . . . . 202 8.1 GCC compiler flow for an application. . . . . . . . . . . . . . . . . . 218 8.2 Main stages of our allocation algorithm. . . . . . . . . . . . . . . . . 219 8.3 Benchmark suite information - Part 1 . . . . . . . . . . . . . . . . . . 233 8.4 Benchmark suite information - Part 2 . . . . . . . . . . . . . . . . . . 233 8.5 Benchmark suite information - Part 3 . . . . . . . . . . . . . . . . . . 234 9.1 Normalized runtime of our method for the default scenario. . . . . . . 237 vii

Description:
dynamic data structures such as linked lists, trees and graphs in programs. Many compiler techniques for heap analysis group all heap objects allocated at a single site into a single heap ”variable”. Additional techniques such as shape analysis have aimed to identify logical heap structures, su
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.