ebook img

Techniques for Accelerating Microprocessor Simulation PDF

168 Pages·2012·1.54 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Techniques for Accelerating Microprocessor Simulation

TECHNIQUES FOR ACCELERATING MICROPROCESSOR SIMULATION BY RAMKUMAR SRINIVASAN, B.E. A thesis submitted to the Graduate School in partial ful(cid:12)llment of the requirements for the degree Master of Science in Electrical Engineering New Mexico State University Las Cruces, New Mexico May 2004 \Techniques For Accelerating Microprocessor Simulation," a thesis prepared by Ramkumar Srinivasan in partial ful(cid:12)llment of the requirements for the degree, Master of Science in Electrical Engineering, has been approved and accepted by the following: Linda Lacey Dean of the Graduate School Jeanine Cook Chair of the Examining Committee Date Committee in charge: Dr. Jeanine Cook, Chair Dr. Phillip L. De Leon Dr. Richard L. Oliver ii DEDICATION To my family: mother, father, uncle, aunt, sister and brother-in-law iii ACKNOWLEDGEMENTS First I would like to thank my parents for their love and for providing access to good education throughout my life. I wish to thank my teachers at my undergraduate school for teaching me to enjoy engineering. At NMSU, I remain indebted to my advisor, Dr.Jeanine Cook, for her support and help. Taking her graduate computer architecture class made me love the subject and work on it for my thesis. Apart from technical help, Dr.Cook has been a great person to resort to during times of di(cid:14)culty and confusion. I express my gratitude to Dr.Phillip De Leon for helping me with DSP algorithms over the course of my research. His exceptional teaching has always left me spellbound in his classes. Manytechniques usedinmythesis wereinspired fromhisdigitalspeech processing class. I remain grateful to Dr. Martha Remmenga for taking time o(cid:11) her busy schedules on numerous occasions to help resolve questions about the Chi-Square similarity metric. I thank my good friend, Wiplove for pointing out the holes in my outra- geous research ideas and for keeping me in good spirits throughout my graduate studies. I also thank Artie, Cheryl and Kaye for their great company and for frequently making fun of my thesis writing. iv VITA 1978 Born in India 1996-2000 B.E, Bangalore University, India 2000-2002 Software Engineer, Wipro Technologies, India 2002-2004 Research/Teaching Assistant Department of Electrical and Computer Engineering New Mexico State University Publications Ramkumar Srinivasan, Jeanine Cook, \Fast, Accurate Micro-Architecture Simu- lation," in Proceedings of the Applications for a Changing World, ITEA Modelling & Simulation Workshop, 8-11 December 2003 at Las Cruces, New Mexico. Sharath Ramanathan, Ramkumar Srinivasan, Jeanine Cook, \Intrinsic Data Lo- cality of Modern Scienti(cid:12)c Workloads," in the Proceedings of IEEE Sixth Annual Workshop on Workload Charecterization, October 27 2003 at Austin, Texas. Field Of Study Major Field: Electrical Engineering (Computer Engineering) v ABSTRACT TECHNIQUES FOR ACCELERATING MICROPROCESSOR SIMULATION BY RAMKUMAR SRINIVASAN, B.E. Master of Science in Electrical Engineering New Mexico State University Las Cruces, New Mexico, 2004 Dr. Jeanine Cook, Chair Detailed, cycle-accurate simulation has become the most common platform for theexplorationandperformanceanalysisofmicro-architecturalinnovations. How- ever, simulation-based micro-architecture research is often hindered by the slow speed of simulators. For instance the 45 benchmark-input combinations of the SPEC2000 benchmark suite execute in excess of 7.2 trillion instructions and takes about (cid:12)ve months of simulation time on contemporary, high-end desktops. To reduce the time incurred by detailed simulation, we introduce two new tech- niques that accurately speedup micro-architectural simulations. First we propose a method that would enable several networked machines to collaborate for the vi simulation of a benchmark. The simulation accuracy of this method is found to be orders of magnitude better than existing simulation speedup techniques. Next, we identify the regions of unique behavior in the execution of a benchmark. Sim- ulating in detail only the regions of unique behavior, reduces the simulation time for SPEC2000 from 5 months to 5 days. vii TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Mathematical Model Driven Simulators . . . . . . . . . . . . . . . 2 1.2 Trace Driven Simulators . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Execution Driven Simulators . . . . . . . . . . . . . . . . . . . . . 4 1.4 The SimpleScalar Simulator . . . . . . . . . . . . . . . . . . . . . 4 1.5 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.6 Benchmark Subsetting . . . . . . . . . . . . . . . . . . . . . . . . 7 1.7 Using Smaller Input Files . . . . . . . . . . . . . . . . . . . . . . 8 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Statistical Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Benchmark Input Reduction . . . . . . . . . . . . . . . . . . . . . 14 2.3 Benchmark Selection . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 SimPoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 Statistical Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6 Parallel Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 PARALLEL SIMULATION . . . . . . . . . . . . . . . . . . . 23 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Basic Principle and Algorithm . . . . . . . . . . . . . . . . . . . . 25 3.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.2 Theoretical Bound on Speedup . . . . . . . . . . . . . . . . . . 34 viii 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 Simulation Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.2 Simulation speedup . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4 INTELLIGENT SAMPLING . . . . . . . . . . . . . . . . . . 50 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.1.1 Non Representative Samples . . . . . . . . . . . . . . . . . . . . 52 4.1.2 Identifying Representative Samples . . . . . . . . . . . . . . . . 53 4.2 Distribution Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.1 Chi-Square Distance . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.2 Non-Uniform Bins . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.3 Trends in IPC Behavior . . . . . . . . . . . . . . . . . . . . . . 63 4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3.1 Choosing N and T . . . . . . . . . . . . . . . . . . . . . . . . . 68 b 4.3.2 Choosing W . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 l 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4.1 E(cid:11)ect of the Number of Regions Selected for Simulation . . . . 75 4.4.2 Identi(cid:12)ed Regions . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4.3 Comparison to Simpoints . . . . . . . . . . . . . . . . . . . . . 78 4.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . 79 5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 APPENDIX A. Parallel simulator code . . . . . . . . . . . . . . . . . 89 A.1 Modi(cid:12)cation done to sim-outorder.c . . . . . . . . . . . . . . . . . 89 A.2 pss common.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 ix A.3 Execution script generator . . . . . . . . . . . . . . . . . . . . . . 117 A.4 Statistics comparison between the parallel/sequential simulator . . 119 APPENDIX B. Intelligent sampling code . . . . . . . . . . . . . . . . 124 B.1 To determine C and (cid:31)2 for all benchmarks . . . . . . . . . . . . . 124 B.2 cluster.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 B.3 To generate sample points . . . . . . . . . . . . . . . . . . . . . . 130 B.4 cluster for all 100.m . . . . . . . . . . . . . . . . . . . . . . . . . 132 B.5 To compute the e(cid:11)ectiveness of the points . . . . . . . . . . . . . 135 B.6 To compute the decision boundaries . . . . . . . . . . . . . . . . . 141 B.7 make lloyd.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 B.8 To compute the Edge Similarity Metric . . . . . . . . . . . . . . . 144 B.9 ga.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 B.10 To determine the best parameters for each benchmark . . . . . . . 147 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 x

Description:
To my family: mother, father, uncle, aunt, sister and brother-in-law iii .. 4.8 Centroid, Chi-Square distance for various Nb and T for the facerec benchmark 70 . sheets. Using multiple suites remedies this problem to an extent. putation of a solution for a complex set of equations that can only be
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.