Zheng Cui Candidate Computer Science Department This dissertation is approved, and it is acceptable in quality and form for publication: Approved by the Dissertation Committee: P atr ick G. B ridges , Chairperson Dorian Arnold Jedidiah R. Crandall Peter A. Dinda Nasir Ghani Enhancing HPC on Virtual Systems in Clouds through Optimizing Virtual Overlay Networks by Zheng Cui B.S., Computer Science, Zhengzhou University, 2003 M.S., Computer Science, University of New Mexico, 2007 DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Computer Science The University of New Mexico Albuquerque, New Mexico July, 2013 (cid:13)c 2013, Zheng Cui iii Dedication I dedicate this dissertation to my family; past, present, and future. iv Acknowledgments I would like to thank my advisers, Professor Patrick G. Bridges and Professor Peter A. Dinda, for their guidance and support. v Enhancing HPC on Virtual Systems in Clouds through Optimizing Virtual Overlay Networks by Zheng Cui B.S., Computer Science, Zhengzhou University, 2003 M.S., Computer Science, University of New Mexico, 2007 Ph.D., Computer Science, University of New Mexico, 2013 Abstract Virtual Ethernet overlay provides a powerful model for realizing virtual distributed and parallel computing systems with strong isolation, portability, and recoverability properties. However, in extremely high throughput and low latency networks, such overlays can suffer from bandwidth and latency limitations, which is of particular concern in HPC environments. Through a careful and quantitative analysis, I iden- tify three core issues limiting performance: delayed and excessive virtual interrupt delivery into guests, copies between host and guest data buffers during encapsula- tion, andthesemantic gapbetween virtual Ethernet features andunderlying physical network features. I propose three novel optimizations in response: optimistic timer- free virtual interrupt injection, zero-copy cut-through data forwarding, and virtual TCP offload. These optimizations improve the latency and bandwidth of the overlay vi network on 10 Gbps Ethernet and InfiniBand interconnects, resulting in near-native performance for a wide range of microbenchmarks and MPI application benchmarks. vii Contents List of Figures xiv List of Tables xviii 1 Introduction 1 1.1 HPC in Cloud Computing Systems . . . . . . . . . . . . . . . . . . . 2 1.1.1 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 High Performance Computing . . . . . . . . . . . . . . . . . . 3 1.1.3 HPC in Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Virtual Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Virtual Overlay Networking . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.2 Overlay Performance Challenges . . . . . . . . . . . . . . . . . 6 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 viii Contents 2 Related Work 11 2.1 Virtualization for Scientific Computing . . . . . . . . . . . . . . . . . 11 2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.2 HPC with Virtualization . . . . . . . . . . . . . . . . . . . . . 12 2.1.3 Palacios VMM . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Network Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Xen Networking . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.3 VMware ESX Networking . . . . . . . . . . . . . . . . . . . . 14 2.2.4 Hyper-V Networking . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Overlay Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2 Connections with Virtual Networking . . . . . . . . . . . . . . 17 2.3.3 VNET Model and VNET/U . . . . . . . . . . . . . . . . . . . 17 2.4 Virtual Networking Optimization . . . . . . . . . . . . . . . . . . . . 20 2.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.2 Virtual NIC Optimization . . . . . . . . . . . . . . . . . . . . 21 2.4.3 Virtual Interrupt Optimization . . . . . . . . . . . . . . . . . 21 2.4.4 InfiniBand Virtualization . . . . . . . . . . . . . . . . . . . . . 22 ix Contents 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 Analysis 24 3.1 User-level Overlay Context Switch Overhead . . . . . . . . . . . . . . 24 3.2 VMM-level Homogeneous and Heterogeneous Overlay . . . . . . . . . 25 3.2.1 Delayed Virtual Interrupts . . . . . . . . . . . . . . . . . . . . 26 3.2.2 Excessive Virtual Interrupts . . . . . . . . . . . . . . . . . . . 27 3.2.3 High Resolusion Timer Noise . . . . . . . . . . . . . . . . . . 28 3.3 Semantic Gap on High-end Interconnects . . . . . . . . . . . . . . . . 28 3.3.1 Use minimal interconnect features . . . . . . . . . . . . . . . . 29 3.3.2 Translate to advanced interconnect features . . . . . . . . . . 29 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4 Optimizations 31 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 VMM-level Virtual Overlay . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 Optimistic Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.1 Early Virtual Interrupt (EVI) delivery . . . . . . . . . . . . . 34 4.3.2 End of Coalescing (EoC) notification . . . . . . . . . . . . . . 36 4.3.3 EVI/EoC Interaction . . . . . . . . . . . . . . . . . . . . . . . 37 4.4 Zero-copy Cut-through Data Forwarding . . . . . . . . . . . . . . . . 38 x
Description: