Table Of ContentIMPROVING ACCURACY FOR SOFTWARE MULTIPLEXING OF
ON-CHIP PERFORMANCE COUNTERS
BY
Wiplove Mathur, B.E. Electronics
A thesis submitted to the Graduate School
in partial fulfillment of the requirements
for the degree
Master of Science in Electrical Engineering
New Mexico State University
Las Cruces, New Mexico
May 2004
“Improving Accuracy for Software Multiplexing of On-Chip Performance Coun-
ters,” a thesis prepared by Wiplove Mathur in partial fulfillment of the require-
ments for the degree, Master of Science in Electrical Engineering, has been ap-
proved and accepted by the following:
Linda Lacey
Dean of the Graduate School
Jeanine Cook
Chair of the Examining Committee
Date
Committee in charge:
Dr. Jeanine Cook, Chair
Dr. Richard Oliver
Dr. Juris Reinfelds
ii
DEDICATION
I dedicate this work to my parents.
iii
ACKNOWLEDGEMENTS
First and foremost, I thank my advisor Dr. Jeanine Cook for guiding me
through the path to success; I also thank her for all the encouragement, resources
and support. I take this opportunity to express my gratitude to my committee
members, Dr. Juris Reinfelds and Dr. Richard Oliver, who took time off from
their tight schedule and provided me with their invaluable suggestions. I thank
Dr. Steven Stochaj for providing us with a stand alone machine; my research
would not have been possible without it. I am also grateful to Dr. Eric Johnson
who guided me through the initial stages of MS and whose teaching is impeccable.
I express my heartfelt gratitude to my parents, who have made my dreams
come alive. Their immense love, care and support has been a perennial source of
encouragement and inspiration. I thank my mom, for her prayers and her emo-
tional and mental support; my dad, for endowing me with all the good things in
life. My friends play a very crucial role in what I am today. My special thanks to:
Komalfortheimmenseamountofconfidenceshehasinme, andIonlyhopeIhave
lived up to it, thanks for always being there for me; Ram, for helping me untangle
the technical bottlenecks in my research, for the refreshing games of ping-pong
and Gnibbles, and for being a great friend; Nutan for taking care of me and keep-
ing me on track; Smriti, Uma and Anjana for pampering me and extending their
moral support; Srujan for being a superb roommate. I truly treasure the friend-
ship of Prini, Tina, tpkings, Vinita, Preeti, Lakeri, Rajesh, Sulabh, Pradyot, Laya,
iv
Swapna, Dimple, Nishant, Meghraj, Rahul, Swapnil, Tania, Omayra, Chandrika,
Sharath, Arun, Hrishi.
I thank God for his blessings, and for making this acknowledgement pos-
sible.
v
VITA
August 25, 1980 Born in Mumbai (Bombay), India
June 1997 High School, Maharashtra State Board of Secondary
and Higher Secondary Education
[Atomic Energy Junior College, Bombay, India]
June 2001 B.E. Electronics, Bombay University
[Veermata Jijabai Technological Institute (VJTI),
Bombay, India]
2001–2002 Teaching Assistant, Department of ECE, NMSU
2002–2003 Research Assistant, Department of ECE, NMSU
Professional And Honorary Societies
Institute of Electrical and Electronic Engineers (IEEE)
Publications
Wiplove Mathur, Jeanine Cook, “Toward Accurate Performance Evaluation using
Hardware Counters,” in Proceedings of the Applications for a Changing World,
ITEA Modelling & Simulation Workshop, 8-11 December 2003 at Las Cruces,
New Mexico.
Field Of Study
Major Field: Electrical Engineering (Computer Engineering)
vi
ABSTRACT
IMPROVING ACCURACY FOR SOFTWARE MULTIPLEXING OF
ON-CHIP PERFORMANCE COUNTERS
BY
Wiplove Mathur, B.E. Electronics
Master of Science in Electrical Engineering
New Mexico State University
Las Cruces, New Mexico, 2004
Dr. Jeanine Cook, Chair
On-chip performance counters are gaining popularity as an analysis and
validation tool. Various drivers and interfaces have been developed to access
these counters. Most contemporary processors have between two and six physical
counters that can monitor an equal number of unique events simultaneously at
fixed sampling periods. Through multiplexing and estimation, an even greater
number of unique events can be monitored. When program is sampled in multi-
plexed mode using round-robin scheduling of an eventset, the number of events
that are physically counted during each sampling time slice is constrained by the
vii
number of counters. During this time, the remaining events of the multiplexed
eventset are not monitored, rather their counts are estimated. Our work quan-
tifies the estimation error of the event-counts in the multiplexed mode, which
indicates that as large as 42% of intervals are estimated with error greater than
10%. We propose new estimation algorithms that result in an accuracy improve-
ment of up to 40%. Additionally, we combine successive counts in order to reduce
the estimation errors even further. Moreover, we propose a method that enables
cycle-synchronized collection of all countable events that will aid in correlation
studies and proper validation of simulation results.
viii
TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Performance Evaluation Methodologies . . . . . . . . . . . . . . . 2
1.2.1 Non-hardware: Mathematical Modelling . . . . . . . . . . . . . 3
1.2.2 Non-hardware: Simulation . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Hardware: Direct Measurement . . . . . . . . . . . . . . . . . . 5
1.2.4 Hardware: Performance Monitoring Counters (PMC) . . . . . . 6
1.3 Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1 Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 SimpleScalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Simics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2.1 System Visibility . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2.2 Tracing and Debugging . . . . . . . . . . . . . . . . . . . . . 16
2.2 PMC Interfacing Tools . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 PAPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Perfmon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 VTune Performance Analyzer . . . . . . . . . . . . . . . . . . . 23
2.2.4 DCPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
ix
3 PROBLEM STATEMENT AND RELATED WORK . . . . . . . 27
3.1 Motivation for Interval Performance Data . . . . . . . . . . . . . 27
3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 Performance Counters: Non-multiplexed Mode . . . . . . . . . . 30
3.2.2 Performance Counters: Multiplexed Mode . . . . . . . . . . . . 33
3.3 Measurements and Variables . . . . . . . . . . . . . . . . . . . . . 34
3.4 Comparison Methodology . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Base Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Inaccuracies in the Base Algorithm and its Consequences . . . . . 40
3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4 SOLUTION: METHODOLOGY AND ALGORITHMS . . . . . . 48
4.1 Workloads and Events . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.3 Evaluation Technique . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.1 Trapezoid-Area Method (TAM) . . . . . . . . . . . . . . . . . . 59
4.3.2 Divided-Interval Rectangular-Area Method (DIRA) . . . . . . . 59
4.3.3 Positional Mean-Error (PME) . . . . . . . . . . . . . . . . . . . 62
4.3.4 Multiple Linear Regression Model (MLR) . . . . . . . . . . . . 64
4.4 Combining the Counts . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5 Synchronized Counts . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.6 Sensitivity Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5 RESULTS AND ANALYSIS . . . . . . . . . . . . . . . . . . . . . 72
x
Description:1.2.4 Hardware: Performance Monitoring Counters (PMC) 6 CYCLES
DIV BUSY Pentium-III Number of cycles during which the divisor is busy.