ebook img

IMPROVING ACCURACY FOR SOFTWARE MULTIPLEXING OF ON PDF

220 Pages·2004·1.48 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview IMPROVING ACCURACY FOR SOFTWARE MULTIPLEXING OF ON

IMPROVING ACCURACY FOR SOFTWARE MULTIPLEXING OF ON-CHIP PERFORMANCE COUNTERS BY Wiplove Mathur, B.E. Electronics A thesis submitted to the Graduate School in partial fulfillment of the requirements for the degree Master of Science in Electrical Engineering New Mexico State University Las Cruces, New Mexico May 2004 “Improving Accuracy for Software Multiplexing of On-Chip Performance Coun- ters,” a thesis prepared by Wiplove Mathur in partial fulfillment of the require- ments for the degree, Master of Science in Electrical Engineering, has been ap- proved and accepted by the following: Linda Lacey Dean of the Graduate School Jeanine Cook Chair of the Examining Committee Date Committee in charge: Dr. Jeanine Cook, Chair Dr. Richard Oliver Dr. Juris Reinfelds ii DEDICATION I dedicate this work to my parents. iii ACKNOWLEDGEMENTS First and foremost, I thank my advisor Dr. Jeanine Cook for guiding me through the path to success; I also thank her for all the encouragement, resources and support. I take this opportunity to express my gratitude to my committee members, Dr. Juris Reinfelds and Dr. Richard Oliver, who took time off from their tight schedule and provided me with their invaluable suggestions. I thank Dr. Steven Stochaj for providing us with a stand alone machine; my research would not have been possible without it. I am also grateful to Dr. Eric Johnson who guided me through the initial stages of MS and whose teaching is impeccable. I express my heartfelt gratitude to my parents, who have made my dreams come alive. Their immense love, care and support has been a perennial source of encouragement and inspiration. I thank my mom, for her prayers and her emo- tional and mental support; my dad, for endowing me with all the good things in life. My friends play a very crucial role in what I am today. My special thanks to: Komalfortheimmenseamountofconfidenceshehasinme, andIonlyhopeIhave lived up to it, thanks for always being there for me; Ram, for helping me untangle the technical bottlenecks in my research, for the refreshing games of ping-pong and Gnibbles, and for being a great friend; Nutan for taking care of me and keep- ing me on track; Smriti, Uma and Anjana for pampering me and extending their moral support; Srujan for being a superb roommate. I truly treasure the friend- ship of Prini, Tina, tpkings, Vinita, Preeti, Lakeri, Rajesh, Sulabh, Pradyot, Laya, iv Swapna, Dimple, Nishant, Meghraj, Rahul, Swapnil, Tania, Omayra, Chandrika, Sharath, Arun, Hrishi. I thank God for his blessings, and for making this acknowledgement pos- sible. v VITA August 25, 1980 Born in Mumbai (Bombay), India June 1997 High School, Maharashtra State Board of Secondary and Higher Secondary Education [Atomic Energy Junior College, Bombay, India] June 2001 B.E. Electronics, Bombay University [Veermata Jijabai Technological Institute (VJTI), Bombay, India] 2001–2002 Teaching Assistant, Department of ECE, NMSU 2002–2003 Research Assistant, Department of ECE, NMSU Professional And Honorary Societies Institute of Electrical and Electronic Engineers (IEEE) Publications Wiplove Mathur, Jeanine Cook, “Toward Accurate Performance Evaluation using Hardware Counters,” in Proceedings of the Applications for a Changing World, ITEA Modelling & Simulation Workshop, 8-11 December 2003 at Las Cruces, New Mexico. Field Of Study Major Field: Electrical Engineering (Computer Engineering) vi ABSTRACT IMPROVING ACCURACY FOR SOFTWARE MULTIPLEXING OF ON-CHIP PERFORMANCE COUNTERS BY Wiplove Mathur, B.E. Electronics Master of Science in Electrical Engineering New Mexico State University Las Cruces, New Mexico, 2004 Dr. Jeanine Cook, Chair On-chip performance counters are gaining popularity as an analysis and validation tool. Various drivers and interfaces have been developed to access these counters. Most contemporary processors have between two and six physical counters that can monitor an equal number of unique events simultaneously at fixed sampling periods. Through multiplexing and estimation, an even greater number of unique events can be monitored. When program is sampled in multi- plexed mode using round-robin scheduling of an eventset, the number of events that are physically counted during each sampling time slice is constrained by the vii number of counters. During this time, the remaining events of the multiplexed eventset are not monitored, rather their counts are estimated. Our work quan- tifies the estimation error of the event-counts in the multiplexed mode, which indicates that as large as 42% of intervals are estimated with error greater than 10%. We propose new estimation algorithms that result in an accuracy improve- ment of up to 40%. Additionally, we combine successive counts in order to reduce the estimation errors even further. Moreover, we propose a method that enables cycle-synchronized collection of all countable events that will aid in correlation studies and proper validation of simulation results. viii TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Performance Evaluation Methodologies . . . . . . . . . . . . . . . 2 1.2.1 Non-hardware: Mathematical Modelling . . . . . . . . . . . . . 3 1.2.2 Non-hardware: Simulation . . . . . . . . . . . . . . . . . . . . . 4 1.2.3 Hardware: Direct Measurement . . . . . . . . . . . . . . . . . . 5 1.2.4 Hardware: Performance Monitoring Counters (PMC) . . . . . . 6 1.3 Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1 Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.1 SimpleScalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2 Simics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.2.1 System Visibility . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.2.2 Tracing and Debugging . . . . . . . . . . . . . . . . . . . . . 16 2.2 PMC Interfacing Tools . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 PAPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Perfmon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.3 VTune Performance Analyzer . . . . . . . . . . . . . . . . . . . 23 2.2.4 DCPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 ix 3 PROBLEM STATEMENT AND RELATED WORK . . . . . . . 27 3.1 Motivation for Interval Performance Data . . . . . . . . . . . . . 27 3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.1 Performance Counters: Non-multiplexed Mode . . . . . . . . . . 30 3.2.2 Performance Counters: Multiplexed Mode . . . . . . . . . . . . 33 3.3 Measurements and Variables . . . . . . . . . . . . . . . . . . . . . 34 3.4 Comparison Methodology . . . . . . . . . . . . . . . . . . . . . . 36 3.5 Base Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.6 Inaccuracies in the Base Algorithm and its Consequences . . . . . 40 3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 SOLUTION: METHODOLOGY AND ALGORITHMS . . . . . . 48 4.1 Workloads and Events . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . 51 4.2.1 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2.2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.3 Evaluation Technique . . . . . . . . . . . . . . . . . . . . . . . . 56 4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.1 Trapezoid-Area Method (TAM) . . . . . . . . . . . . . . . . . . 59 4.3.2 Divided-Interval Rectangular-Area Method (DIRA) . . . . . . . 59 4.3.3 Positional Mean-Error (PME) . . . . . . . . . . . . . . . . . . . 62 4.3.4 Multiple Linear Regression Model (MLR) . . . . . . . . . . . . 64 4.4 Combining the Counts . . . . . . . . . . . . . . . . . . . . . . . . 68 4.5 Synchronized Counts . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.6 Sensitivity Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5 RESULTS AND ANALYSIS . . . . . . . . . . . . . . . . . . . . . 72 x

Description:
1.2.4 Hardware: Performance Monitoring Counters (PMC) 6 CYCLES DIV BUSY Pentium-III Number of cycles during which the divisor is busy.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.