ABSTRACT Dynamic Data Race Detection for Structured Parallelism by Raghavan Raman With the advent of multicore processors and an increased emphasis on parallel computing, parallel programming has become a fundamental requirement for achiev- ing available performance. Parallel programming is inherently hard because, to rea- son about the correctness of a parallel program, programmers have to consider large numbers of interleavings of statements in different threads in the program. Though structuredparallelismimposessomerestrictionsontheprogrammer,itisanattractive approach because it provides useful guarantees such as deadlock-freedom. However, data races remain a challenging source of bugs in parallel programs. Data races may occur only in few of the possible schedules of a parallel program, thereby making them extremely hard to detect, reproduce, and correct. In the past, dynamic data race detection algorithms have suffered from at least one of the following limitations: some algorithms have a worst-case linear space and time overhead [1], some algo- rithms are dependent on a specific scheduling technique [2], some algorithms generate false positives and false negatives [3, 4], some have no empirical evaluation as yet [2], and some require sequential execution of the parallel program [5, 6]. In this thesis, we introduce dynamic data race detection algorithms for structured parallel programs that overcome past limitations. We present a race detection algo- rithm called ESP-bags that requires the input program to be executed sequentially and another algorithm called SPD3 that can execute the program in parallel. While the ESP-bags algorithm addresses all the above mentioned limitations except sequen- tial execution, the SPD3 algorithm addresses the issue of sequential execution by scaling well across highly parallel shared memory multiprocessors. Our algorithms incur constant space overhead per memory location and time overhead that is in- dependent of the number of processors on which the programs execute. Our race detection algorithms support a rich set of parallel constructs (including async, finish, isolated, and future) that are found in languages such as HJ, X10, and Cilk. Our algorithms for async, finish, and future are precise and sound for a given input. In the presence of isolated, our algorithms are precise but not sound. Our experiments show that our algorithms (for async, finish, and isolated) perform well in practice, incurring an average slowdown of under 3× over the original execution time on a suite of 15 benchmarks. SPD3 is the first practical dynamic race detection algorithm for async-finish parallel programs that can execute the input program in parallel and use constant space per memory location. This takes us closer to our goal of build- ing dynamic data race detectors that can be “always-on” when developing parallel applications. Acknowledgments I would like to express my deepest gratitude to my advisor, Prof. Vivek Sarkar, for his support and guidance throughout this dissertation. He has been a constant source of inspiration for me in many ways. His enthusiasm in solving problems is infectious. He gave me the freedom to express myself in my work all along while also guiding me in the right direction. He has always been ready to help in both academic and personal issues. I would like to thank him specifically for his support during the troubled times in my PhD. This dissertation would not have been possible without him. I could not have asked for a better advisor. My sincere thanks to Prof. John Mellor-Crummey for agreeing to be on my thesis committee and for his feedback and suggestions at various stages during my dissertation. His suggestions to improve parts of my dissertation were very useful. I would also like to thank him for collaborating on extending the parallel data race detector for futures. I enjoyed all the technical discussions that I have had with him. The effort he puts in to get everything right is truly amazing. I have learned a lot from him. I would like to extend my sincere thanks to Prof. Lin Zhong for agreeing to be on my thesis committee and for his enthusiasm in learning about my work. His insights and feedback were very useful in improving my thesis. I express my sincere gratitude to Prof. Martin Vechev (ETH) and Prof. Eran Yahav (Technion) with whom I have had the pleasure of collaborating for the past three years. A major part of this dissertation was a result of my collaboration with them. I learned a lot from them right from my visit to IBM Research in 2009. I have enjoyed every bit of my interactions with them. It has always been fun working with them. Many thanks to Prof. Vivek Sarkar for introducing me to them. I am grateful to Jisheng Zhao for all the collaborative work over the past few years. It was a pleasure working with him on all the publications we co-authored. A special thanks to him for implementing the static optimizations that are used in this work. I would like to thank my fellow graduate students David Peixotto and Rajkishore Barik for helping me during my early days at Rice. Being my office mate, Dave was forced to listen to everything I had to say and also to answer all my questions. A special thanks to him for putting up with me all along. I would also like to thank Prof. Keith Cooper whose frequent visits to our office came as a relief on stressful days. Dave and I enjoyed listening to many fun stories that he had. My hearty thanks to every member (past and present) of the Habanero group for all the interactions that I have had with them. I have enjoyed every bit being part of the Habanero group. I would like to thank everyone who patiently listened and gave me feedback on some of my talks that were repeated multiple times. A special thanks to my beloved wife, Vanitha Vijayaraghavan, for supporting me throughout and for putting up with my long night hours at work. I can never forget those innumerable visits to Starbucks well past midnight while writing my thesis. Thanks to her for staying up all night regularly giving me company in writing my thesis and also for all the yummy food served right at my work table. Her love, support, andencouragement were animportantreasonthatIwas abletocomplete my PhD sooner. Now that I am done, I hope to give her back in many-fold for all these. My sincere thanks to my parents who have been very supportive and encouraging all along during my graduate course. They have been patiently and eagerly waiting for me to complete my PhD. My deepest gratitude to them without whom this work would not have been possible. I would also like to thank my in-laws, uncles, and aunts for trusting and encouraging me and also for patiently waiting for me to cross the line. Many thanks to my friends Jeyarama Ananta, Kaushik Kumar Ram, and Ramku- mar Krishnamurthy for making my stay at Rice very enjoyable and for being there during my troubled times. I cant imagine my state had they not been around during those times. Iwould also like to thankeveryone who hashelped medirectly orindirectly during my graduate study at Rice. Finally, I thank the Almighty for giving me the confidence to pursue my dreams and the strength to complete my journey. Contents Abstract ii List of Illustrations xi List of Tables xv 1 Introduction 1 1.1 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Background 7 2.1 Parallel Programming Models . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Three Approaches to Parallel Programming . . . . . . . . . . 8 2.1.2 Structured and Unstructured Models . . . . . . . . . . . . . . 9 2.2 Habanero Java: A Structured Parallel Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Parallel Constructs . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Comparison with Cilk . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Data Races . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Data Race Detection . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 Guarantees of Data Race Detection Algorithms . . . . . . . . 21 2.3.3 Data Race Detection for Structured Parallelism . . . . . . . . 22 2.3.4 Past Work on Data Race Detection . . . . . . . . . . . . . . . 23 3 Sequential Data Race Detection for HJ 26 viii 3.1 Background: SP-bags for Fully-Strict Computations . . . . . . . . . . 27 3.2 ESP-bags: A Generalization of SP-bags for Terminally-Strict Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 ESP-bags for Async-Finish . . . . . . . . . . . . . . . . . . . . 33 3.2.2 Extending ESP-bags for Isolated Blocks . . . . . . . . . . . . 42 3.3 ESP-bags for Labeled-Finish . . . . . . . . . . . . . . . . . . . . . . . 48 3.4 ESP-bags for Single-Get Futures . . . . . . . . . . . . . . . . . . . . . 51 3.4.1 A Maximal Generalization of ESP-bags . . . . . . . . . . . . . 55 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4 SPD3: A Parallel Algorithm for Detecting Data Races in HJ 58 4.1 SPD3 for Async-Finish . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1.1 Dynamic Program Structure Tree . . . . . . . . . . . . . . . . 59 4.1.2 Shadow Memory . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.1.3 SPD3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.1.4 Soundness and Precision . . . . . . . . . . . . . . . . . . . . . 76 4.1.5 Space Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.1.6 Time Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2 Extending SPD3 for Isolated Blocks . . . . . . . . . . . . . . . . . . . 78 4.2.1 Shadow Memory with Isolated . . . . . . . . . . . . . . . . . . 79 4.2.2 Extended SPD3 Algorithm . . . . . . . . . . . . . . . . . . . . 80 4.2.3 Soundness and Precision . . . . . . . . . . . . . . . . . . . . . 86 4.2.4 Space Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2.5 Time Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3 Extending SPD3 for Futures . . . . . . . . . . . . . . . . . . . . . . . 87 4.3.1 Dynamic Program Structure Tree with Futures . . . . . . . . 88 4.3.2 Enhancing the DPST with Futures . . . . . . . . . . . . . . . 95 ix 4.3.3 Shadow Memory with Futures . . . . . . . . . . . . . . . . . . 101 4.3.4 Extended SPD3 Algorithm . . . . . . . . . . . . . . . . . . . . 103 4.3.5 Soundness and Precision . . . . . . . . . . . . . . . . . . . . . 107 4.3.6 Space Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.7 Time Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5 Correctness Proofs 109 5.1 ESP-bags for Async-Finish . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 SPD3 for Async-Finish . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6 Implementation 125 6.1 Design of Data Race Detectors . . . . . . . . . . . . . . . . . . . . . . 125 6.2 Disjoint-set in ESP-bags . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.3 DPST in SPD3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.4 Relaxing the Atomicity Requirement in SPD3 . . . . . . . . . . . . . 131 7 Static Optimizations for Data Race Detection 135 7.1 Main Task Check Elimination in Sequential Code Regions . . . . . . 136 7.2 Read-only Check Elimination in Parallel Code Regions . . . . . . . . 138 7.3 Escape Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.4 Loop Invariant Check Motion . . . . . . . . . . . . . . . . . . . . . . 140 7.5 Read/Write Check Elimination . . . . . . . . . . . . . . . . . . . . . 142 8 Experimental Results 144 8.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 8.2 Data Races Observed . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 8.3 Evaluation of ESP-bags for Async-Finish-Isolated . . . . . . . . . . . 147 8.3.1 Performance of ESP-bags . . . . . . . . . . . . . . . . . . . . . 147 8.3.2 Performance of Static Optimizations . . . . . . . . . . . . . . 149 x 8.3.3 Comparison with Serialized Execution . . . . . . . . . . . . . 153 8.4 Evaluation of SPD3 for Async-Finish-Isolated . . . . . . . . . . . . . 155 8.4.1 Performance of SPD3 . . . . . . . . . . . . . . . . . . . . . . . 155 8.4.2 Comparison of SPD3 with ESP-bags . . . . . . . . . . . . . . 163 8.4.3 Comparison of SPD3 with Eraser and FastTrack . . . . . . . . 165 9 Related Work 173 9.1 Dynamic Data Race Detection . . . . . . . . . . . . . . . . . . . . . . 173 9.1.1 Lockset based Algorithms . . . . . . . . . . . . . . . . . . . . 174 9.1.2 Happens-Before based Algorithms . . . . . . . . . . . . . . . . 175 9.2 Static Analysis for Data Race Detection and Avoidance . . . . . . . . 183 9.2.1 Static Analysis to Improve Dynamic Race Detectors . . . . . . 185 9.3 Data Race Detection in Hardware . . . . . . . . . . . . . . . . . . . . 186 9.4 Determinism Checking . . . . . . . . . . . . . . . . . . . . . . . . . . 188 10Conclusions and Future Work 191 Bibliography 195 A Anomaly in Crypt with Optimizations 212
Description: