ebook img

c 2010 ABHINAV BHATELE PDF

187 Pages·2010·5.41 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview c 2010 ABHINAV BHATELE

(cid:13)c 2010 ABHINAV BHATELE AUTOMATING TOPOLOGY AWARE MAPPING FOR SUPERCOMPUTERS BY ABHINAV BHATELE DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2010 Urbana, Illinois Doctoral Committee: Professor Laxmikant V. Kale, Chair Professor William D. Gropp Professor David A. Padua Matthew H. Reilly, Ph.D. Abstract Petascale machines with hundreds of thousands of cores are being built. These machines have varying interconnect topologies and large network diameters. Com- putation is cheap and communication on the network is becoming the bottleneck for scaling of parallel applications. Network contention, specifically, is becoming an increasingly important factor affecting overall performance. The broad goal of this dissertation is performance optimization of parallel applications through reduction of network contention. Most parallel applications have a certain communication topology. Mapping of tasks in a parallel application based on their communication graph, to the physical processors on a machine can potentially lead to performance improvements. Map- ping of the communication graph for an application on to the interconnect topology of a machine while trying to localize communication is the research problem under consideration. The farther different messages travel on the network, greater is the chance of resource sharing between messages. This can create contention on the network for networks commonly used today. Evaluative studies in this dissertation show that on IBM Blue Gene and Cray XT machines, message latencies can be severely affected under contention. Realizing this fact, application developers have started paying attention to the mapping of tasks to physical processors to minimize contention. Placement of communicating tasks on nearby physical processors can minimize the distance traveled by messages and reduce the chances of contention. Performance improvements through topology aware placement for applications ii such as NAMD and OpenAtom are used to motivate this work. Building on these ideas, the dissertation proposes algorithms and techniques for automatic mapping of parallel applications to relieve the application developers of this burden. The effect of contention on message latencies is studied in depth to guide the design of map- ping algorithms. The hop-bytes metric is proposed for the evaluation of mapping algorithms as a better metric than the previously used maximum dilation metric. The main focus of this dissertation is on developing topology aware mapping algo- rithms for parallel applications with regular and irregular communication patterns. The automatic mapping framework is a suite of such algorithms with capabilities to choose the best mapping for a problem with a given communication graph. The dissertation also briefly discusses completely distributed mapping techniques which will be imperative for machines of the future. iii (cid:106)(cid:111) (cid:115)(cid:0) (cid:69)(cid:109)(cid:114)(cid:116) (cid:69)(cid:115)(cid:69)(cid:68) (cid:104)(cid:111)i (cid:103)(cid:110) (cid:110)(cid:65)(cid:121)(cid:107) (cid:107)(cid:69)(cid:114)(cid:98)(cid:114) (cid:98)(cid:100)(cid:110)। (cid:107)(cid:114)u a(cid:110)(cid:0) (cid:103)(cid:125)(cid:104) (cid:115)(cid:111)i (cid:98)(cid:0) (cid:69)(cid:136) (cid:114)(cid:65)(cid:69)(cid:115) (cid:115)(cid:0) (cid:66) (cid:103)(cid:0) (cid:110) (cid:115)(cid:100)(cid:110)॥ 1॥ If one remembers Him, all efforts are successful. He, who is the master of the Ganas and has the face of a handsome elephant (Lord Ganesh), the source of wisdom and culmination of auspicious qualities, may He bless me || 1 || a(cid:226)(cid:65)(cid:110)(cid:69)(cid:116)(cid:69)(cid:109)(cid:114)(cid:65)(cid:6)(cid:68)(cid:45)(cid:121) (cid:226)(cid:65)(cid:110)(cid:65)(cid:210)(cid:110)(cid:102)(cid:108)(cid:65)(cid:107)(cid:121)(cid:65)। (cid:99)(cid:34)(cid:0) (cid:122)(cid:6)(cid:109)(cid:70)(cid:69)(cid:108)(cid:116)(cid:109)(cid:94) (cid:121)(cid:3)(cid:110) (cid:116)(cid:45)(cid:109)(cid:123) (cid:153)(cid:70)(cid:103)(cid:0) (cid:122)(cid:118)(cid:3) (cid:110)(cid:109)(cid:44)॥ 2॥ I bow to the noble Guru who has opened my eyes, blinded by the darkness of ignorance, using a collyrium stick of knowledge || 2 || iv (cid:109)(cid:3)(cid:114)(cid:70) (cid:109)(cid:32)(cid:65), (cid:69)(cid:103)(cid:69)(cid:114)(cid:106)(cid:65) (cid:66)(cid:86)(cid:3)(cid:108)(cid:3) (cid:107)(cid:111) (cid:115)(cid:109)(cid:69)(cid:112)(cid:13)(cid:116)। To my Maa, Girija Bhatele. v Acknowledgments The role played by my advisor, Prof. Kale in shaping this dissertation and my career cannot be put in a few words. I will always be in his debt and admire him for the great person he is. My parents and family have made me the person I am today and I cannot thank them enough for being supportive of whatever I do. I hope that my Maa will be proud when she reads this dissertation and hopefully this will be a tiny token of appreciation for all that she has done for me. My gurus: teachers at school, professors at IIT Kanpur and Illinois, Brni. Sucheta Chaitanya and Neeb Karori Baba have been a source of guidance and encouragement and I cannot thank them enough. I would also like to thank Eric Bohm, Sameer Kumar, Gagan Gupta and Filippo Gioachin, who worked with me on research related to my thesis. I am also thankful to my dissertation committee for valuable suggestions and comments. Finally, I would like to thank my colleagues at the Parallel Programming Laboratory for their constant criticism and I apologize to them for occupying Prof. Kale’s precious time so often. However, if you want extra meeting time, walk with him to his car when he leaves for home in the evenings. vi Grants This dissertation used allocations on several supercomputers at various NSF and DOE centers. Without running time on these machines, this work would not be as relevant and convincing. This research was supported in part by NSF through TeraGrid[[1]]resourcesprovidedbyNCSAandPSCthroughgrantsASC050039Nand MCA93S028. I wish to thank Fred Mintzer and Glenn Martyna from IBM for access and assistance in running on the Watson Blue Gene/L. This work also used running time on the Blue Gene/P at ANL, which is supported by DOE under contract DE- AC02-06CH11357. Time allocation on Jaguar at ORNL was also used, which is supported by the DOE under contract DE-AC05-00OR22725. Accounts on Jaguar were made available via the Performance Evaluation and Analysis Consortium End Station, a DOE INCITE project. vii Software Credits Latex and gnuplot have been used in preparing this dissertation and are central to the writing of this document. Several other softwares and libraries were used in the process of working on my dissertation and writing it. The script bargraph.pl developed by Derek Bruening (Co-Founder and Chief Architect, VMware, Inc.) has been used for creating nice bar graphs. Paraview, developed as a collaboration by Kitware, Inc., Sandia, LANL and others was used for visualizing maps for Ope- nAtom. Omnigraffle and Adobe Illustrator (from the Creative Suite) were used for creating the mapping diagrams in this dissertation which made the text much more understandable. Triangle [[2]], developed by Prof. Jonathan Shewchuk was used for triangulation of meshes generated as inputs for this research. Finally, Graphviz [[3]], developed by AT&T Research Labs was used for visualizing regular and irregular graphs and their mappings and as a library in Chapter 10. viii Table of Contents List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 Recent Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Contributions of This Thesis . . . . . . . . . . . . . . . . . . . . . . . 10 3 Existing Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1 Fat Tree and Clos Networks . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Mesh and Torus Networks . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 Other Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Understanding Network Congestion . . . . . . . . . . . . . . . . . 18 4.1 WOCON: No Contention Benchmark . . . . . . . . . . . . . . . . . . 18 4.2 WICON: Random Contention Benchmark . . . . . . . . . . . . . . . 23 4.3 Controlled Contention Experiments . . . . . . . . . . . . . . . . . . . 26 4.3.1 Benchmark Stressing a Given Link . . . . . . . . . . . . . . . 26 4.3.2 Benchmark Using Equidistant Pairs . . . . . . . . . . . . . . . 28 5 Hop-bytes as an Evaluation Metric . . . . . . . . . . . . . . . . . . 32 5.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 ix

Description:
3.3 A 4-dimensional hypercube and a Kautz graph . has a total of 20,480 nodes. IBM Blue An n-dimensional hypercube is also called an n-cube.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.