Table Of ContentTECHNIQUES FOR ACCELERATING
MICROPROCESSOR SIMULATION
BY
RAMKUMAR SRINIVASAN, B.E.
A thesis submitted to the Graduate School
in partial ful(cid:12)llment of the requirements
for the degree
Master of Science in Electrical Engineering
New Mexico State University
Las Cruces, New Mexico
May 2004
\Techniques For Accelerating Microprocessor Simulation," a thesis prepared by
Ramkumar Srinivasan in partial ful(cid:12)llment of the requirements for the degree,
Master of Science in Electrical Engineering, has been approved and accepted by
the following:
Linda Lacey
Dean of the Graduate School
Jeanine Cook
Chair of the Examining Committee
Date
Committee in charge:
Dr. Jeanine Cook, Chair
Dr. Phillip L. De Leon
Dr. Richard L. Oliver
ii
DEDICATION
To my family: mother, father, uncle, aunt, sister and brother-in-law
iii
ACKNOWLEDGEMENTS
First I would like to thank my parents for their love and for providing
access to good education throughout my life. I wish to thank my teachers at my
undergraduate school for teaching me to enjoy engineering. At NMSU, I remain
indebted to my advisor, Dr.Jeanine Cook, for her support and help. Taking
her graduate computer architecture class made me love the subject and work on
it for my thesis. Apart from technical help, Dr.Cook has been a great person
to resort to during times of di(cid:14)culty and confusion. I express my gratitude to
Dr.Phillip De Leon for helping me with DSP algorithms over the course of my
research. His exceptional teaching has always left me spellbound in his classes.
Manytechniques usedinmythesis wereinspired fromhisdigitalspeech processing
class. I remain grateful to Dr. Martha Remmenga for taking time o(cid:11) her busy
schedules on numerous occasions to help resolve questions about the Chi-Square
similarity metric.
I thank my good friend, Wiplove for pointing out the holes in my outra-
geous research ideas and for keeping me in good spirits throughout my graduate
studies. I also thank Artie, Cheryl and Kaye for their great company and for
frequently making fun of my thesis writing.
iv
VITA
1978 Born in India
1996-2000 B.E, Bangalore University, India
2000-2002 Software Engineer, Wipro Technologies, India
2002-2004 Research/Teaching Assistant
Department of Electrical and Computer Engineering
New Mexico State University
Publications
Ramkumar Srinivasan, Jeanine Cook, \Fast, Accurate Micro-Architecture Simu-
lation," in Proceedings of the Applications for a Changing World, ITEA Modelling
& Simulation Workshop, 8-11 December 2003 at Las Cruces, New Mexico.
Sharath Ramanathan, Ramkumar Srinivasan, Jeanine Cook, \Intrinsic Data Lo-
cality of Modern Scienti(cid:12)c Workloads," in the Proceedings of IEEE Sixth Annual
Workshop on Workload Charecterization, October 27 2003 at Austin, Texas.
Field Of Study
Major Field: Electrical Engineering (Computer Engineering)
v
ABSTRACT
TECHNIQUES FOR ACCELERATING
MICROPROCESSOR SIMULATION
BY
RAMKUMAR SRINIVASAN, B.E.
Master of Science in Electrical Engineering
New Mexico State University
Las Cruces, New Mexico, 2004
Dr. Jeanine Cook, Chair
Detailed, cycle-accurate simulation has become the most common platform for
theexplorationandperformanceanalysisofmicro-architecturalinnovations. How-
ever, simulation-based micro-architecture research is often hindered by the slow
speed of simulators. For instance the 45 benchmark-input combinations of the
SPEC2000 benchmark suite execute in excess of 7.2 trillion instructions and
takes about (cid:12)ve months of simulation time on contemporary, high-end desktops.
To reduce the time incurred by detailed simulation, we introduce two new tech-
niques that accurately speedup micro-architectural simulations. First we propose
a method that would enable several networked machines to collaborate for the
vi
simulation of a benchmark. The simulation accuracy of this method is found to
be orders of magnitude better than existing simulation speedup techniques. Next,
we identify the regions of unique behavior in the execution of a benchmark. Sim-
ulating in detail only the regions of unique behavior, reduces the simulation time
for SPEC2000 from 5 months to 5 days.
vii
TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Mathematical Model Driven Simulators . . . . . . . . . . . . . . . 2
1.2 Trace Driven Simulators . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Execution Driven Simulators . . . . . . . . . . . . . . . . . . . . . 4
1.4 The SimpleScalar Simulator . . . . . . . . . . . . . . . . . . . . . 4
1.5 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Benchmark Subsetting . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Using Smaller Input Files . . . . . . . . . . . . . . . . . . . . . . 8
2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Statistical Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Benchmark Input Reduction . . . . . . . . . . . . . . . . . . . . . 14
2.3 Benchmark Selection . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 SimPoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Statistical Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Parallel Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 PARALLEL SIMULATION . . . . . . . . . . . . . . . . . . . 23
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Basic Principle and Algorithm . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Theoretical Bound on Speedup . . . . . . . . . . . . . . . . . . 34
viii
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 Simulation Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.2 Simulation speedup . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 INTELLIGENT SAMPLING . . . . . . . . . . . . . . . . . . 50
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.1 Non Representative Samples . . . . . . . . . . . . . . . . . . . . 52
4.1.2 Identifying Representative Samples . . . . . . . . . . . . . . . . 53
4.2 Distribution Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.1 Chi-Square Distance . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.2 Non-Uniform Bins . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.3 Trends in IPC Behavior . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3.1 Choosing N and T . . . . . . . . . . . . . . . . . . . . . . . . . 68
b
4.3.2 Choosing W . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
l
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4.1 E(cid:11)ect of the Number of Regions Selected for Simulation . . . . 75
4.4.2 Identi(cid:12)ed Regions . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.4.3 Comparison to Simpoints . . . . . . . . . . . . . . . . . . . . . 78
4.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . 79
5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
APPENDIX A. Parallel simulator code . . . . . . . . . . . . . . . . . 89
A.1 Modi(cid:12)cation done to sim-outorder.c . . . . . . . . . . . . . . . . . 89
A.2 pss common.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
ix
A.3 Execution script generator . . . . . . . . . . . . . . . . . . . . . . 117
A.4 Statistics comparison between the parallel/sequential simulator . . 119
APPENDIX B. Intelligent sampling code . . . . . . . . . . . . . . . . 124
B.1 To determine C and (cid:31)2 for all benchmarks . . . . . . . . . . . . . 124
B.2 cluster.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
B.3 To generate sample points . . . . . . . . . . . . . . . . . . . . . . 130
B.4 cluster for all 100.m . . . . . . . . . . . . . . . . . . . . . . . . . 132
B.5 To compute the e(cid:11)ectiveness of the points . . . . . . . . . . . . . 135
B.6 To compute the decision boundaries . . . . . . . . . . . . . . . . . 141
B.7 make lloyd.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
B.8 To compute the Edge Similarity Metric . . . . . . . . . . . . . . . 144
B.9 ga.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
B.10 To determine the best parameters for each benchmark . . . . . . . 147
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
x
Description:To my family: mother, father, uncle, aunt, sister and brother-in-law iii .. 4.8 Centroid, Chi-Square distance for various Nb and T for the facerec benchmark 70 . sheets. Using multiple suites remedies this problem to an extent. putation of a solution for a complex set of equations that can only be