Static Task Partitioning Techniques for Parallel Applications on Heterogeneous Processors by Aravind Vasudevan Submitted to the School of Computer Science and Statistics in fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science) School of Computer Science and Statistics UNIVERSITY OF DUBLIN, TRINITY COLLEGE July 2016 2 Declaration I declare that this thesis has not been submitted as an exercise for a degree at this or any other university and it is entirely my own work. IagreetodepositthisthesisintheUniversity’sopenaccessinstitutionalrepository or allow the library to do so on my behalf, subject to Irish Copyright Legislation and Trinity College Library conditions of use and acknowledgement. ................................................................................ Aravind Vasudevan Dated: July 12, 2016 2 3 Everything in life has a form and function, follows a mathematical rule. Laws of motion existed even before Sir Isaac Newton “discovered” them. It was just a matter of realization of the existence of such equations. Maybe, there exists a mother-of-all equation which governs the working of all forms and functions. When looking at a specific system, some parameters zero out and the equation contracts, giving rise to smaller simpler equations which dictate the principles that govern entirety. Maybe, this is what they call God. 3 4 To my mother, and best well-wisher, Vijayalakshmi To my father, and best advisor, Vasudevan To my wife, and best friend, Rukmani 4 5 Acknowledgements My heartfelt thanks are due to Dr. David Gregg, the IRC foundation and Trinity for giving me this prestigious opportunity of pursuing my passion to do research. I like to think of David as an “advisor” rather than a “supervisor” and it has been an absolute pleasure working with him. He has always had an open door policy and was always available to theory-craft crazy ideas that most often than not ended up scrubbed off the board. However, there were moments of brilliance, mainly because of David, which has resulted in this thesis. Thanks to his patience and understanding, my journey as a PhD student was smoother than it had any right to be! 1 Many thanks are due to two collaborators of mine who made me feel at home when I came to Dublin. Mark Purcell from IBM, who were kind enough to partly fund my research, helped me settle into the research setting here in Ireland. As soon as I started my PhD, he helped me hit the ground running by letting me co-author a paper he was working on at the time. Many thanks are in order to Dr. Avinash Malik who was in our lab for the first two years of my PhD. He was the one who pointed me in the direction of static scheduling algorithms and helped me focus on an interesting problem to work on. I would also like to thank a few people who helped me get to Dublin: Prof. Venkateshwaran Nagarajan, who is a very dear mentor of mine from my younger research days at WAran Research FoundaTion (WARFT). He took me under his wing and taught me a lot more than research. To you sir, I will be eternally grate- ful; Dr. Ivo Ihrke for helping me get to Dublin and teaching a wonderful class on Computational Photography that is still bright as day in my head; Dr. Ananthan for instilling a sense of research in me and many many other people who were guiding beacons in my life. 1Acknowledgeception: I’d like to thank Andrew for coming up with this beaut of a line 5 6 I cannot forget to mention my lab mates2: Andrew, Martin, Mircea, Servesh andShixiongformakingthelongwindingnightsofburningthemidnightoilseema lotmorepalatable. Iwillmissthecountlesshoursofdebatesandargumentsabout: optimum window positioning for optimum room temperature; placing smaller con- vex polygons on top of each other to cover a slightly bigger convex polygon; count- less Olympiad questions from Bulgaria and last but definitely not least the never ending competitions to see who can eat the most burritos. I owe my deepest grat- itude to all my friends in Dublin for making life so much more enjoyable in this wet, damp country. I remember during my formative years as a young researcher at the WARFT, I would come back home at 2 AM in the morning every day after a long and arduous day of research and arguments with fellow research trainees. My mother would wake up, even though I would sneakily enter the house, and serve me food and stay up with me until I go to bed. I cannot thank research enough for bringing me and my mom a lot closer. My father has been a real life role model who I get to meet and live with every day. He has taught me more things than I can remember, but most importantly he taught me how to be humane. He taught through practice and showed me it is more important than most other things in the world. He drilled in me the idea of being level headed and centred. He taught me, come high or come low to stay with my feet firmly on the ground. My brother was the one who introduced me to computers. I remember sitting next to him, while he did his thing and just watching him and thinking to myself that I wanted to be like that one day. He taught me to persevere through hard times. I will never forget the quote that he once told me: “You are a wise amongst the fool. You need to be a fool amongst the wise, to learn the world better”. Without my family’s undying love and inspiration, I would not even be close to half the researcher I am today. Real life superheroes are hard to come by, but I am glad all of you did! 2No seriously, they will kill me if I do 6 7 Last, but by no means the least, I would love to say thanks to my dear and loving wife, Rukmani Sridharan. Without her unwavering support, I definitely would not have been able to pull through my PhD. I cannot thank her enough for the many nights of unending debates on evolution and its fickle mind, the countless discussions on startup ideas, our undying passion for good food, her undying passion to eat healthy at the same time. Thank you for being more concerned about me than I am. More than anything, thank you for being there. I cannot wait for the days when she goes through her final PhD stages, when I can torture her equally and more! Somebody once said “It’s the smaller things that count in life”, or was it “The devil is in the detail”? I can’t remember which. In any case, I would like to end my acknowledgements on a lighter note by acknowledging the smaller things in life that helped me truck through my PhD: Starcraft for providing countless hours of relief transporting me to another dimension, when things didn’t work so well in research; Liverpool for being an unending source of excitement and inspiration in my life. Following Liverpool through the years has been the best roller coaster I have ever been on. Bill Shankly once said “Football isn’t a matter of life and death of course, I am very disappointed with that attitude. I can assure you it is much, much more important than that” and I am inclined to agree with the shankman; Slash for keeping me company in the wee hours of the night with his private concerts just for my ears through this magical portal called YouTube; and finally Burritos for being so darned tasty! 7 8 Static Task Partitioning Techniques for Parallel Applications on Heterogeneous Processors by Aravind Vasudevan Submitted to the School of Computer Science and Statistics on July 12, 2016, in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science) Abstract A critical factor contributing to the efficiency of execution of parallel applications onparallelcomputingresourcesisthemethodchosentomapandschedulethetasks of the parallel application. This problem, often referred to as the DAG schedul- ing problem, of statically mapping and scheduling a weighted directed acyclic graph (DAG) to a set of heterogeneous processors to minimize the completion time (makespan) has been extensively studied. It remains intractable even with severe assumptions applied to the task and machine models. This thesis tackles two challenges faced by scheduling algorithms when map- ping and scheduling tasks onto heterogeneous processors: (1) the execution time of a task on heterogeneous processors is not well represented in the conventional application DAGs (2) critical-path is poorly defined in the presence of communi- cation and such heterogeneity. Weaddressthefirstchallengebyadoptingabetterrepresentationmodelforthe application DAGs and the resource graphs that enable processors to be selectively faster for certain kinds of tasks. We propose, design and evaluate a simulated an- nealing based task mapping algorithm that exploits this representation and maps applications with task and data level parallelism onto a set of heterogeneous pro- cessors. Thenoveltyofthisalgorithmliesinguidingtherandomsearchusingoneof the systemic parameters called temperature. We observe significant improvements, in quality of the solutions for real world benchmarks, when compared against other well established task mapping algorithms. As an additional contribution, we show that similar meta-heuristic techniques are effective for partitioning road networks for distributed simulation. Inthecontextofthesecondrelatedchallengeoffindingcriticalpathsonasetof heterogeneous processors, existing solutions to calculate the critical path use mean values of computation and communication. In the presence of heterogeneity and communication costs these methods of calculating the critical path are rendered 8 9 ineffective. We resolve this issue, by postulating that the critical path is inher- ently defined by its mapping and formulate a polynomial time algorithm (Critical Earliest Finish Time (CEFT)) to find the true critical path for a given application DAG and parallel computing resources. This algorithm runs in O(P2e), where P is the number of different classes of processors and e is the total number of edges in the application graph. Based on our experiments, we show that the critical path lengths produced by our algorithm is always at least as long as the ones produced by CPOP for the conventional workloads. Our experiments also show that when we leverage the better representation model from our earlier work, the paths found by our algorithm are shorter than CPOP’s paths in 83.99% of the experiments. We also extend our critical path finding algorithm into a DAG scheduling algorithm (CEFT-CPOP) by using the path found by our algorithm (with its cor- responding partial assignment) in conjunction with the critical path on a processor (CPOP) algorithm, with a running cost of O(p2e), where p is the number of pro- cessors. We compare the efficacy of our algorithm mainly against CPOP through the use of makespan related comparison metrics like: schedule length ratio (SLR), speedup and slack. We find that our algorithm outperforms CPOP even as a scheduling algorithm, in nearly all aspects. Thesis Supervisor: David Gregg Title: Associate Professor 9 10 10
Description: