ebook img

Concurrent Direct Network Access for Virtual Machine - CiteSeer PDF

12 Pages·2007·0.12 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Concurrent Direct Network Access for Virtual Machine - CiteSeer

ThispaperoriginallyappearedatHPCA2007 Concurrent Direct Network Access for Virtual Machine Monitors PaulWillmann† JeffreyShafer† DavidCarr† AravindMenon‡ ScottRixner† AlanL.Cox† WillyZwaenepoel‡ †RiceUniversity ‡EPFL Houston,TX Lausanne,Switzerland {willmann,shafer,dcarr,rixner,alc}@rice.edu {aravind.menon,willy.zwaenepoel}@epfl.ch Abstract machinesafelyandfairly. Inprinciple,general-purposeop- eratingsystems,suchasUnixandWindows,offerthesame Thispaperpresentshardwareandsoftwaremechanisms capability for multiple services to share the same physical toenableconcurrentdirectnetworkaccess(CDNA)byop- machine. However, VMMsprovideadditional advantages. erating systems running within a virtual machine monitor. Forexample,VMMsallowservicesimplementedindiffer- In a conventional virtual machine monitor, each operating entorcustomizedenvironments,includingdifferentoperat- system running within a virtual machine must access the ingsystems,tosharethesamephysicalmachine. network through a software-virtualized network interface. Modern VMMs for commodity hardware, such as Thesevirtualnetworkinterfacesaremultiplexedinsoftware VMWare[1,7]andXen[4],virtualizeprocessor,memory, ontoaphysicalnetworkinterface,incurringsignificantper- and I/O devices in software. This enables these VMMs to formanceoverheads. TheCDNAarchitectureimprovesnet- support a variety of hardware. In an attempt to decrease workingefficiencyandperformancebydividingthetasksof thesoftwareoverheadofvirtualization,bothAMDandIntel trafficmultiplexing,interruptdelivery,andmemoryprotec- areintroducinghardwaresupportforvirtualization[2,10]. tion between hardware and software in a novel way. The Specifically, their hardware support for processor virtual- virtual machine monitor delivers interrupts and provides ization is currently available, and their hardware support protectionbetweenvirtualmachines,whilethenetworkin- formemoryvirtualizationisimminent. Asthesehardware terfaceperformsmultiplexingofthenetworkdata. Ineffect, mechanismsmature,theyshouldreducetheoverheadofvir- the CDNA architecture provides the abstraction that each tualization,improvingtheefficiencyofVMMs. virtualmachineisconnecteddirectlytoitsownnetworkin- Despite the renewed interest in system virtualization, terface. ThroughtheuseofCDNA,manyofthebottlenecks there is still no clear solution to improve the efficiency of imposed by software multiplexing can be eliminated with- I/O virtualization. To support networking, a VMM must out sacrificing protection, producing substantial efficiency present each virtual machine with a virtual network inter- improvements. face that is multiplexed in software onto a physical net- work interface card (NIC). The overhead of this software- basednetworkvirtualizationseverelylimitsnetworkperfor- 1 Introduction mance [12, 13, 19]. For example, a Linux kernel running within a virtual machine on Xen is only able to achieve about30%ofthenetworkthroughputthatthesamekernel In many organizations, the economics of supporting canachieverunningdirectlyonthephysicalmachine. a growing number of Internet-based services has created Thispaperproposesandevaluatesconcurrentdirectnet- a demand for server consolidation. Consequently, there work access (CDNA), a new I/O virtualization technique has been a resurgence of interest in machine virtualiza- combining both software and hardware components that tion[1,2,4,7,9,10,11,19,22]. Avirtualmachinemoni- significantlyreducestheoverheadofnetworkvirtualization tor(VMM)enablesmultiplevirtualmachines,eachencap- in VMMs. The CDNA network virtualization architecture sulating one or more services, to share the same physical provides virtual machines running on a VMM safe direct ThisworkwassupportedinpartbytheTexasAdvancedTechnologyPro- access to the network interface. With CDNA, each virtual gramunderGrantNo.003604-0078-2003,bytheNationalScienceFoun- machineisallocatedauniquecontextonthenetworkinter- dationunderGrantNo.CCF-0546140,byagrantfromtheSwissNational faceandcommunicatesdirectlywiththenetworkinterface ScienceFoundation,andbygiftsfromAdvancedMicroDevices,Hewlett- Packard,andXilinx. through that context. In this manner, the virtual machines 1 ThispaperoriginallyappearedatHPCA2007 that run on the VMM operate as if each has access to its Driver Domain(cid:13) Guest(cid:13) owndedicatednetworkinterface. Back-End Drivers(cid:13) Domain 1(cid:13) Front-End(cid:13) Using CDNA, a single virtual machine running Linux Page(cid:13) Driver(cid:13) Guest(cid:13) Ethernet(cid:13) Flipping(cid:13) Domain 2(cid:13) cantransmitatarateof1867Mb/swith51%idletimeand Bridge(cid:13) Packet(cid:13) Front-End(cid:13) Data(cid:13) Driver(cid:13) receiveatarateof1874Mb/swith41%idletime. Incon- trast, at 97% CPU utilization, Xen is only able to achieve NIC Driver(cid:13) Virtual Interrupts(cid:13) 1602Mb/sfortransmitand1112Mb/sforreceive. Further- more, with 24 virtual machines, CDNA can still transmit Driver(cid:13) Control(cid:13) and receive at a rate of over 1860 Mb/s, but with no idle Hypervisor(cid:13) Interrupt Dispatch(cid:13) t8i9m1e.MIbn/scaonndtrarestc,eXiveenatisaornaltyeoabfl5e5t8oMtrabn/ssmwiitthat2a4rvaitretuoafl Pt Dacaktae(cid:13)(cid:13) InterHruypptesr(cid:13)visor(cid:13) Control + Data(cid:13) machines. NIC(cid:13) Control(cid:13) CPU / Memory / Disk / Other Devices(cid:13) TheCDNAnetworkvirtualizationarchitectureachieves thisdramaticincreaseinnetworkefficiencybydividingthe Figure1.Xenvirtualmachineenvironment. tasksoftrafficmultiplexing,interruptdelivery,andmemory protection among hardware and software in a novel way. interruptsinthesystemandpassesthemontotheguestop- Traffic multiplexing is performed directly on the network erating systems, as appropriate. Finally, all I/O operations interface, whereas interrupt delivery and memory protec- gothroughXeninordertoensurefairandnon-overlapping tionareperformedbytheVMMwithsupportfromthenet- accesstoI/Odevicesbytheguests. work interface. This division of tasks into hardware and Figure1showstheorganizationoftheXenVMM.Xen software components simplifies the overall software archi- consistsoftwoelements: thehypervisorandthedriverdo- tecture,minimizesthehardwareadditionstothenetworkin- main.Thehypervisorprovidesanabstractionlayerbetween terface,andaddressesthenetworkperformancebottlenecks the virtual machines, called guest domains, and the actual ofXen. hardware,enablingeachguestoperatingsystemtoexecute The remainder of this paper proceeds as follows. The as if it were the only operating system on the machine. nextsectiondiscussesnetworkingintheXenVMMinmore However,theguestoperatingsystemscannotdirectlycom- detail.Section3describeshowCDNAmanagestrafficmul- municatewiththephysicalI/Odevices. Exclusiveaccessto tiplexing,interruptdelivery,andmemoryprotectioninsoft- thephysicaldevicesisgivenbythehypervisortothedriver wareandhardwaretoprovideconcurrentaccesstotheNIC. domain,aprivilegedvirtualmachine. Eachguestoperating Section4thendescribesthecustomhardwareNICthatfa- systemisthengivenavirtualI/Odevicethatiscontrolledby cilitates concurrent direct network access on a single de- aparavirtualizeddriver,calledafront-enddriver.Inorderto vice. Section5presentstheexperimentalmethodologyand accessaphysicaldevice,suchasthenetworkinterfacecard results. Finally, Section 6 discusses related work and Sec- (NIC), the guest’s front-end driver communicates with the tion7concludesthepaper. corresponding back-end driver in the driver domain. The driver domain then multiplexes the data streams for each 2 NetworkinginXen guest onto the physical device. The driver domain runs a modified version of Linux that uses native Linux device driverstomanageI/Odevices. 2.1 HypervisorandDriverDomainOperation Asthefigureshows,inordertoprovidenetworkaccess to the guest domains, the driver domain includes a soft- A VMM allows multiple guest operating systems, each ware Ethernet bridge that interconnects the physical NIC runninginavirtualmachine,toshareasinglephysicalma- andallofthevirtualnetworkinterfaces. Whenapacketis chinesafelyandfairly. Itprovidesisolationbetweenthese transmittedbyaguest,itisfirsttransferredtotheback-end guest operating systems and manages their access to hard- driver in the driver domain using a page remapping oper- wareresources. XenisanopensourceVMMthatsupports ation. Within the driver domain, the packet is then routed paravirtualization,whichrequiresmodificationstotheguest through the Ethernet bridge to the physical device driver. operatingsystem[4].Bymodifyingtheguestoperatingsys- The device driver enqueues the packet for transmission on temstointeractwiththeVMM,thecomplexityoftheVMM thenetworkinterfaceasifitweregeneratednormallybythe canbereducedandoverallsystemperformanceimproved. operatingsystemwithinthedriverdomain. Whenapacket Xenperformsthreekeyfunctionsinordertoprovidevir- isreceived,thenetworkinterfacegeneratesaninterruptthat tualmachineenvironments. First, Xenallocatesthephysi- iscapturedbythehypervisorandroutedtothenetworkin- calresourcesofthemachinetotheguestoperatingsystems terface’s device driver in the driver domain as a virtual in- andisolatesthemfromeachother. Second,Xenreceivesall terrupt. Thenetworkinterface’sdevicedrivertransfersthe 2 ThispaperoriginallyappearedatHPCA2007 packet to the Ethernet bridge, which routes the packet to System Transmit(Mb/s) Receive(Mb/s) the appropriate back-end driver. The back-end driver then NativeLinux 5126 3629 transfersthepackettothefront-enddriverintheguestdo- XenGuest 1602 1112 mainusingapageremappingoperation. Oncethepacketis transferred,theback-enddriverrequeststhatthehypervisor Table 1. Transmit and receive performance send a virtual interrupt to the guest notifying it of the new fornativeLinux2.6.16.29andparavirtualized packet. Upon receiving the virtual interrupt, the front-end Linux2.6.16.29asaguestOSwithinXen3. driver delivers the packet to the guest operating system’s networkstack, asifithadcomedirectlyfromthephysical The network interface monitors these mailboxes for such device. writes from the host. When a mailbox update is detected, the NIC reads the new producer value from the mailbox, 2.2 DeviceDriverOperation performsaDMAreadofthedescriptorindicatedbythein- dex,andthenisreadytousetheDMAdescriptor. Afterthe The driver domain in Xen is able to use unmodified NICconsumesadescriptorfromaring,theNICupdatesits Linuxdevicedriverstoaccessthenetworkinterface. Thus, consumerindex,transfersthisconsumerindextoalocation all interactions between the device driver and the NIC are inhostmemoryviaDMA,andraisesaphysicalinterruptto as they would be in an unvirtualized system. These inter- notifythehostthatstatehaschanged. actionsincludeprogrammedI/O(PIO)operationsfromthe Inanunvirtualizedoperatingsystem, thenetworkinter- driver to the NIC, direct memory access (DMA) transfers facetruststhatthedevicedrivergivesitvalidDMAdescrip- bytheNICtoread orwritehostmemory, and physical in- tors. Similarly,thedevicedrivertruststhattheNICwilluse terruptsfromtheNICtoinvokethedevicedriver. theDMAdescriptorscorrectly. Ifeitherentityviolatesthis The device driver directs the NIC to send packets from trust,physicalmemorycanbecorrupted. Xenalsorequires buffers in host memory and to place received packets into thistrustrelationshipbetweenthedevicedriverinthedriver preallocated buffers in host memory. The NIC accesses domainandtheNIC. thesebuffersusingDMAreadandwriteoperations. Inor- derfortheNICtoknowwheretostoreorretrievedatafrom 2.3 Performance thehost,thedevicedriverwithinthehostoperatingsystem generates DMA descriptors for use by the NIC. These de- Despite the optimizations within the paravirtualized scriptors indicate the buffer’s length and physical address drivers to support communication between the guest and onthehost. ThedevicedrivernotifiestheNICviaPIOthat driver domains (such as using page remapping rather than new descriptors are available, which causes the NIC to re- copyingtotransferpackets),Xenintroducessignificantpro- trievethemviaDMAtransfers.OncetheNICreadsaDMA cessing and communication overheads into the network descriptor,itcaneitherreadfromorwritetotheassociated transmit and receive paths. Table 1 shows the network- buffer, depending on whether the descriptor is being used ing performance of both native Linux 2.6.16.29 and para- bythedrivertotransmitorreceivepackets. virtualized Linux 2.6.16.29 as a guest operating system Device drivers organize DMA descriptors in a series of within Xen 3 Unstable1 on a modern Opteron-based sys- ringsthataremanagedusingaproducer/consumerprotocol. temwithsixIntelGigabitEthernetNICs. Inbothconfigu- As they are updated, the producer and consumer pointers rations, checksum offloading, scatter/gather I/O, and TCP wraparoundtheringstocreateacontinuouscircularbuffer. Segmentation Offloading (TSO) were enabled. Support There are separate rings of DMA descriptors for transmit for TSO was recently added to the unstable development andreceiveoperations. TransmitDMAdescriptorspointto branch of Xen and is not currently available in the Xen 3 host buffers that will be transmitted by the NIC, whereas release. As the table shows, a guest domain within Xen is receive DMA descriptors point to host buffers that the OS onlyabletoachieveabout30%oftheperformanceofnative wantstheNICtouseasitreceivespackets. Whenthehost Linux. This performance gap strongly motivates the need driver wants to notify the NIC of the availability of a new fornetworkingperformanceimprovementswithinXen. DMAdescriptor(andhenceanewpackettobetransmitted oranewbuffertobepostedforpacketreception),thedriver 3 ConcurrentDirectNetworkAccess first creates the new DMA descriptor in the next-available slot in the driver’s descriptor ring and then increments the With CDNA, the network interface and the hypervisor producer index on the NIC to reflect that a new descriptor collaboratetoprovidetheabstractionthateachguestoper- is available. The driver updates the NIC’s producer index ating system is connected directly to its own network in- bywritingthevalueviaPIOintoaspecificlocation,called amailbox,withinthedevice’sPCImemory-mappedregion. 1Changeset12053:874cc0ff214dfrom11/1/2006. 3 ThispaperoriginallyappearedatHPCA2007 terface. Thiseliminatesmanyoftheoverheadsofnetwork Guest(cid:13) Domain 1(cid:13) virtualization in Xen. Figure 2 shows the CDNA architec- Guest(cid:13) NIC Driver(cid:13) ture. Thenetworkinterfacemustsupportmultiplecontexts Domain 2(cid:13) Guest(cid:13) in hardware. Each context acts as if it is an independent Virtual Interrupts(cid:13) NIC Driver(cid:13) Domain ...(cid:13) physicalnetworkinterfaceandcanbecontrolledbyasepa- NIC Driver(cid:13) ratedevicedriverinstance. Insteadofassigningownership Hypervisor(cid:13) DInisteprarutcpht(cid:13)(cid:13) Interrupts(cid:13) Packet(cid:13) oftheentirenetworkinterfacetothedriverdomain,thehy- Data(cid:13) Driver(cid:13) Control(cid:13) Control(cid:13) pervisortreatseachcontextasifitwereaphysicalNICand assigns ownership of contexts to guest operating systems. CPU / Memory / Disk(cid:13) CDNA NIC(cid:13) Notice the absence of the driver domain from the figure: eachguestcantransmitandreceivenetworktrafficusingits Figure2.CDNAarchitectureinXen. ownprivatecontextwithoutanyinteractionwithotherguest operatingsystemsorthedriverdomain. Thedriverdomain, otherthanitsown. Whennecessary,thehypervisorcanalso however, is still present to perform control functions and revoke a context at any time by notifying the NIC, which allowaccesstootherI/Odevices. Furthermore, thehyper- will shut down all pending operations associated with the visor is still involved in networking, as it must guarantee indicatedcontext. memoryprotectionanddelivervirtualinterruptstotheguest To multiplex transmit network traffic, the NIC simply operatingsystems. services all of the hardware contexts fairly and interleaves WithCDNA,thecommunicationoverheadsbetweenthe the network traffic for each guest. When network pack- guest and driver domains and the software multiplexing ets are received by the NIC, it uses the Ethernet MAC ad- overheadswithinthedriverdomainareeliminatedentirely. dresstodemultiplexthetraffic,andtransferseachpacketto However, the network interface now must multiplex the theappropriateguestusingavailableDMAdescriptorsfrom traffic across all of its active contexts, and the hypervisor thatguest’scontext. mustprovideprotectionacrossthecontexts. Thefollowing sectionsdescribehowCDNAperformstrafficmultiplexing, 3.2 InterruptDelivery interruptdelivery,andDMAmemoryprotection. In addition to isolating the guest operating systems and 3.1 MultiplexingNetworkTraffic multiplexing network traffic, the hardware contexts on the NIC must also be able to interrupt their respective guests. CDNA eliminates the software multiplexing overheads As the NIC carries out network requests on behalf of any within the driver domain by multiplexing network traffic particular context, the CDNA NIC updates that context’s ontheNIC.Thenetworkinterfacemustbeabletoidentify consumer pointers for the DMA descriptor rings, as de- thesourceortargetguestoperatingsystemforallnetwork scribedinSection2.2. Normally,theNICwouldtheninter- traffic. Thenetworkinterfaceaccomplishesthisbyprovid- rupttheguesttonotifyitthatthecontextstatehaschanged. ingindependenthardwarecontextsandassociatingaunique However, inXenallphysicalinterruptsarehandledbythe Ethernet MAC address with each context. The hypervisor hypervisor. Therefore, theNICcannotphysicallyinterrupt assignsauniquehardwarecontextontheNICtoeachguest theguestoperatingsystemsdirectly. Evenifitwerepossi- operatingsystem. Thedevicedriverwithintheguestoper- bletointerrupttheguestsdirectly,thatcouldcreateamuch atingsystemtheninteractswithitscontextexactlyasifthe higherinterruptloadonthesystem, whichwoulddecrease contextwereanindependentphysicalnetworkinterface.As theperformancebenefitsofCDNA. describedinSection2.2,theseinteractionsconsistofcreat- Under CDNA, the NIC keeps track of which contexts ing DMA descriptors and updating a mailbox on the NIC havebeenupdatedsincethelastphysicalinterrupt, encod- viaPIO. ing this set of contexts in an interrupt bit vector. The NIC Eachcontextonthenetworkinterfacethereforemustin- transfersaninterruptbitvectorintothehypervisor’smem- clude a unique set of mailboxes. This isolates the activity ory space using DMA. The interrupt bit vectors are stored ofeachguestoperatingsystem,sothattheNICcandistin- in a circular buffer using a producer/consumer protocol to guishbetweenthedifferentguests. Thehypervisorassigns ensurethattheyareprocessedbythehostbeforebeingover- acontexttoaguestsimplybymappingtheI/Olocationsfor written by the NIC. After an interrupt bit vector is trans- thatcontext’smailboxesintotheguest’saddressspace. The ferred, the NIC raises a physical interrupt, which invokes hypervisor also notifies the NIC that the context has been the hypervisor’s interrupt service routine. The hypervisor allocated and is active. As the hypervisor only maps each then decodes all of the pending interrupt bit vectors and context into a single guest’s address space, a guest cannot schedules virtual interrupts to each of the guest operating accidentallyorintentionallyaccessanycontextontheNIC systemsthathavependingupdatesfromtheNIC.Whenthe 4 ThispaperoriginallyappearedatHPCA2007 guestoperatingsystemsarenextscheduledbythehypervi- vice.Thus,theuntrustedguestscouldreadorwritememory sor,theCDNAnetworkinterfacedriverwithintheguestre- inanyotherdomainthroughtheNIC,unlessadditionalse- ceivesthesevirtualinterruptsasiftheywereactualphysical curity features are added. To maintain isolation between interruptsfromthehardware. Atthattime,thedriverexam- guests, the CDNA architecture validates and protects all inestheupdatesfromtheNICanddetermineswhatfurther DMA descriptors and ensures that a guest maintains own- action,suchasprocessingreceivedpackets,isrequired. ership of physical pages that are sources or targets of out- standing DMA accesses. Although the hypervisor and the 3.3 DMAMemoryProtection networkinterfacesharetheresponsibilityforimplementing theseprotectionmechanisms,themorecomplexaspectsare implementedinthehypervisor. Inthex86architecture,networkinterfacesandotherI/O ThemostimportantprotectionprovidedbyCDNAisthat devicesusephysicaladdresseswhenreadingorwritinghost it does not allow guest domains to directly enqueue DMA system memory. The device driver in the host operating descriptors into the network interface descriptor rings. In- system is responsible for doing virtual-to-physical address stead,thedevicedriverineachguestmustcallintothehy- translation for the device. The physical addresses are pro- pervisortoperformtheenqueueoperation. Thisallowsthe videdtothenetworkinterfacethroughreadandwriteDMA hypervisor tovalidate thatthephysical addresses provided descriptorsasdiscussedinSection2.2. Byexposingphys- bytheguestare,infact,ownedbythatguestdomain. This ical addresses to the network interface, the DMA engine prevents aguestdomainfromarbitrarilytransmittingfrom ontheNICcanbeco-optedintocompromisingsystemse- orreceivingintoanotherguestdomain.Thehypervisorpre- curity by a buggy or malicious driver. There are two key ventsguestoperatingsystemsfromindependentlyenqueue- I/Oprotectionviolationsthatarepossibleinthex86archi- ing unauthorized DMA descriptors by establishing the hy- tecture. First, the device driver could instruct the NIC to pervisor’sexclusivewriteaccesstothehostmemoryregion transmitpacketscontainingapayloadfromphysicalmem- containingtheCDNAdescriptorringsduringdriverinitial- orythatdoesnotcontainpacketsgeneratedbytheoperating ization. system,therebycreatingasecurityhole. Second,thedevice drivercouldinstructtheNICtoreceivepacketsintophysi- AsdiscussedinSection2.2,conventionalI/Odevicesau- calmemorythatwasnotdesignatedasanavailablereceive tonomously fetch and process DMA descriptors from host buffer,possiblycorruptingmemorythatisinuse. memory at runtime. Though hypervisor-managed valida- In the conventional Xen network architecture discussed tionandenqueuingofDMAdescriptorsensuresthatDMA inSection2.2,Xentruststhedevicedriverinthedriverdo- operations are valid when they are enqueued, the physical maintoonlyusethephysicaladdressesofnetworkbuffers memory could still be reallocated before it is accessed by in the driver domain’s address space when passing DMA thenetworkinterface. Therearetwowaysinwhichsucha descriptors to the network interface. This ensures that all protectionviolationcouldbeexploitedbyabuggyormali- network traffic will be transferred to/from network buffers ciousdevicedriver.First,theguestcouldreturnthememory withinthedriverdomain. Sinceguestdomainsdonotinter- tothehypervisortobereallocatedshortlyafterenqueueing act with the NIC, they cannot initiate DMA operations, so the DMA descriptor. Second, the guest could attempt to theyarepreventedfromcausingeitheroftheI/Oprotection reuse an old DMA descriptor in the descriptor ring that is violationsinthex86architecture. nolongervalid. Though the Xen I/O architecture guarantees that un- When memory is freed by a guest operating system, it trustedguestdomainscannotinducememoryprotectionvi- becomes available for reallocation to another guest by the olations, any domain that is granted access to an I/O de- hypervisor. Hence, ownership of the underlying physical vice by the hypervisor can potentially direct the device to memory can change dynamically at runtime. However, it performDMAoperationsthataccessmemorybelongingto is critical to prevent any possible reallocation of physical other guests, or even the hypervisor. The Xen architecture memoryduringaDMAoperation. CDNAachievesthisby doesnotfundamentallysolvethissecuritydefectbutinstead delaying the reallocation of physical memory that is being limits the scope of the problem to a single, trusted driver used in a DMA transaction until after that pending DMA domain[9]. Therefore,asthedriverdomainistrusted,itis hascompleted. WhenthehypervisorenqueuesaDMAde- unlikelytointentionallyviolateI/Omemoryprotection,but scriptor, it first establishes that the requesting guest owns a buggy driver within the driver domain could do so unin- the physical memory associated with the requested DMA. tentionally. Thehypervisorthenincrementsthereferencecountforeach This solution is insufficient for the CDNA architecture. physicalpageassociatedwiththerequestedDMA.Thisper- In a CDNA system, device drivers in the guest domains page reference counting system already exists within the have direct access to the network interface and are able to Xenhypervisor;solongasthereferencecountisnon-zero, pass DMA descriptors with physical addresses to the de- aphysicalpagecannotbereallocated. Later,thehypervisor 5 ThispaperoriginallyappearedatHPCA2007 then observes whichDMAoperations have completed and mat.Fortunately,thereareonlythreefieldsofinterestinany decrementstheassociatedreferencecounts. Forefficiency, DMAdescriptor: anaddress,alength,andadditionalflags. thereferencecountsareonlydecrementedwhenadditional Thiscommonalityshouldmakeitpossibletogeneralizethe DMAdescriptorsareenqueued,butthereisnoreasonwhy mechanismswithinthehypervisorbyhavingtheNICnotify theycouldnotbedecrementedmoreaggressively,ifneces- thehypervisorofitspreferredformat. TheNICwouldonly sary. need to specify the size of the descriptor and the location AfterenqueuingDMAdescriptors,thedevicedriverno- oftheaddress,length,andflags. Thehypervisorwouldnot tifies the NIC by writing a producer index into a mailbox needtointerprettheflags,sotheycouldjustbecopiedinto location within that guest’s context on the NIC. This pro- the appropriate location. A generic NIC would also need ducer index indicates the location of the last of the newly to support the use of sequence numbers within each DMA created DMA descriptors. The NIC then assumes that all descriptor. Again, the NIC could notify the hypervisor of DMA descriptors up to the location indicated by the pro- the size and location of the sequence number field within ducerindexarevalid. Ifthedevicedriverintheguestincre- thedescriptors. ments theproducer index pastthelastvaliddescriptor, the CDNA’s DMA memory protection is specific to Xen NICwillattempttouseastaleDMAdescriptorthatisinthe onlyinsofarasXenpermitsguestoperatingsystemstouse descriptor ring. Since that descriptor was previously used physicalmemoryaddresses. Consequently,thecurrentim- inaDMAoperation,thehypervisormayhavedecremented plementationmustvalidatetheownershipofthosephysical thereferencecountontheassociatedphysicalmemoryand addressesforeveryrequestedDMAoperation. ForVMMs reallocatedthephysicalmemory. that only permit the guest to use virtual addresses, the hy- TopreventsuchstaleDMAdescriptorsfrombeingused, pervisorcouldjustaseasilytranslatethosevirtualaddresses the hypervisor writes a strictly increasing sequence num- and ensure physical contiguity. The current CDNA imple- ber into each DMA descriptor. The NIC then checks the mentation does not rely on physical addresses in the guest sequencenumberbeforeusinganyDMAdescriptor. Ifthe at all; rather, a small library translates the driver’s virtual descriptorisvalid,thesequencenumberswillbecontinuous addressestophysicaladdresseswithintheguest’sdriverbe- modulothesizeofthemaximumsequencenumber. Ifthey foremakingahypercallrequesttoenqueueaDMAdescrip- arenot, theNICwillrefusetousethedescriptorsandwill tor.ForVMMsthatusevirtualaddresses,thislibrarywould reportaguest-specificprotectionfaulterrortothehypervi- donothing. sor. Because each DMA descriptor in the ring buffer gets a new, increasing sequence number, a stale descriptor will 4 CDNANICImplementation have a sequence number exactly equal to the correct value minus the number of descriptor slots in the buffer. Mak- To evaluate the CDNA concept in a real system, ing the maximum sequence number at least twice as large RiceNIC,aprogrammableandreconfigurableFPGA-based asthenumberofDMAdescriptorsinaringbufferprevents Gigabit Ethernet network interface [17], was modified to aliasingandensuresthatanystalesequencenumberwillbe providevirtualizationsupport. RiceNICcontainsaVirtex- detected. II Pro FPGA with two embedded 300MHz PowerPC pro- cessors, hundreds of megabytes of on-board SRAM and 3.4 Discussion DRAM memories, a Gigabit Ethernet PHY, and a 64- bit/66MHzPCIinterface[3]. Customhardwareassistunits The CDNA interrupt delivery mechanism is neither de- for accelerated DMA transfers and MAC packet handling vice nor Xen specific. This mechanism only requires the are provided on the FPGA. The RiceNIC architecture is device to transfer an interrupt bit vector to the hypervisor similar to the architecture of a conventional network in- viaDMApriortoraisingaphysicalinterrupt. Thisisarela- terface. With basic firmware and the appropriate Linux or tivelysimplemechanismfromtheperspectiveofthedevice FreeBSDdevicedriver,itactsasastandardGigabitEther- andisthereforegeneralizabletoavarietyofvirtualizedI/O netnetworkinterfacethatiscapableoffullysaturatingthe devices. Furthermore, itdoesnotrelyonanyXen-specific Ethernet link while only using one of the two embedded features. processors. ThehandlingoftheDMAdescriptorswithinthehyper- To support CDNA, both the hardware and firmware of visor is linked to a particular network interface only be- the RiceNIC were modified to provide multiple protected cause the format of the DMA descriptors and their rings contexts and to multiplex network traffic. The network is likely to be different for each device. As the hypervisor interface was also modified to interact with the hypervi- mustvalidatethatthehostaddressesreferredtoineachde- sor through a dedicated context to allow privileged man- scriptorbelongtotheguestoperatingsystemthatprovided agement operations. The modified hardware and firmware them, the hypervisor must be aware of the descriptor for- components work together to implement the CDNA inter- 6 ThispaperoriginallyappearedatHPCA2007 faces. normal operation of the network interface—unvirtualized To support CDNA, the most significant addition to the devicedriverswoulduseasinglecontext’smailboxestoin- networkinterfaceisthespecializeduseofthe2MBSRAM teract with the base firmware. Furthermore, the computa- ontheNIC.ThisSRAMisaccessibleviaPIOfromthehost. tionandstoragerequirementsofCDNAareminimal. Only ForCDNA,128KBoftheSRAMisdividedinto32parti- one of the RiceNIC’s two embedded processors is needed tionsof4KBeach. Eachofthesepartitionsisaninterface tosaturatethenetwork,andonly12MBofmemoryonthe toaseparatehardwarecontextontheNIC.OnlytheSRAM NICisneededtosupport32contexts. Therefore, withmi- canbememorymappedintothehost’saddressspace,sono normodifications,commoditynetworkinterfacescouldeas- othermemorylocationsontheNICareaccessibleviaPIO. ilyprovidesufficientcomputationandstorageresourcesto Asacontext’smemorypartitionisthesamesizeasapage supportCDNA. onthehostsystemandbecausetheregionispage-aligned, thehypervisor can triviallymapeach context intoadiffer- 5 Evaluation entguestdomain’saddressspace. Thedevicedriversinthe guestdomainsmayusethese4KBpartitionsasgeneralpur- 5.1 ExperimentalSetup pose shared memory between the corresponding guest op- eratingsystemandthenetworkinterface. The performance of Xen and CDNA network virtual- Within each context’s partition, the lowest 24 memory ization was evaluated on an AMD Opteron-based system locations are mailboxes that can be used to communicate running Xen 3Unstable2. This systemused a Tyan S2882 from the driver to the NIC. When any mailbox is written motherboardwithasingleOpteron250processorand4GB by PIO, a global mailbox event is automatically generated ofDDR400SDRAM.Xen3Unstablewasusedbecauseit by the FPGA hardware. The NIC firmware can then pro- provides the latest support for high-performance network- cesstheeventandefficientlydeterminewhichmailboxand ing, including TCP segmentation offloading, and the most correspondingcontexthasbeenwrittenbydecodingatwo- recentversionofXenoprof[13]forprofilingtheentiresys- levelhierarchyofbitvectors. Allofthebitvectorsaregen- tem. erated automatically by the hardware and stored in a data In all experiments, the driver domain was configured scratchpadforhighspeedaccessbytheprocessor. Thefirst with256MBofmemoryandeachof24guestdomainswere bit vector in the hierarchy determines which of the 32 po- configuredwith128MBofmemory.Eachguestdomainran tentialcontextshaveupdatedmailboxeventstoprocess,and a stripped-down Linux 2.6.16.29 kernel with minimal ser- the second vector in the hierarchy determines which mail- vicesformemoryefficiencyandperformance. Forthebase box(es)inaparticularcontexthavebeenupdated. Oncethe Xenexperiments,asingledual-portIntelPro/1000MTNIC specific mailbox has been identified, that off-chip SRAM was used in the system. In the CDNA experiments, two locationcanbereadbythefirmwareandthemailboxinfor- RiceNICs configured to support CDNA were used in the mationprocessed. system. LinuxTCPparametersandNICcoalescingoptions The mailbox event and associated hierarchy of bit vec- weretunedinthedriverdomainandguestdomainsforopti- tors are managed by a small hardware core that snoops malperformance.Forallexperiments,checksumoffloading dataontheSRAMbusanddispatchesnotificationmessages andscatter/gatherI/Owereenabled. TCPsegmentationoff- whenamailboxisupdated. Asmallstatemachinedecodes loading was enabled for experiments using the Intel NICs, thesemessagesandincrementallyupdatesthedatascratch- but disabled for those using the RiceNICs due to lack of pad with the modified bit vectors. This state machine also support. TheXensystemwassetuptocommunicatewitha handles event-clear messages from the processor that can similarOpteronsystemthatwasrunninganativeLinuxker- clearmultipleeventsfromasinglecontextatonce. nel. This system was tuned so that it could easily saturate two NICs both transmitting and receiving so that it would Each context requires 128 KB of storage on the NIC neverbethebottleneckinanyofthetests. for metadata, such as the rings of transmit- and receive- To validate the performance of the CDNA approach, DMA descriptors provided by the host operating systems. multiplesimultaneousconnectionsacrossmultipleNICsto Furthermore, eachcontext uses128KBofmemoryonthe multiple guests domains were needed. A multithreaded, NICforbufferingtransmitpacketdataand128 KBforre- event-driven,lightweightnetworkbenchmarkprogramwas ceivepacketdata. However,theNIC’stransmitandreceive developedtodistributetrafficacrossaconfigurablenumber packetbuffersareeachmanagedglobally,andhencepacket ofconnections.Thebenchmarkprogrambalancestheband- bufferingissharedacrossallcontexts. width across all connections to ensure fairness and uses a The modifications to the RiceNIC to support CDNA singlebufferperthreadtosendandreceivedatatominimize were minimal. The major hardware change was the addi- thememoryfootprintandimprovecacheperformance. tionalmailboxstorageandhandlinglogic. Thiscouldeas- ilybeaddedtoanexistingNICwithoutinterferingwiththe 2Changeset12053:874cc0ff214dfrom11/1/2006. 7 ThispaperoriginallyappearedatHPCA2007 DomainExecutionProfile Interrupts/s System NIC Mb/s DriverDomain GuestOS Driver Guest Hyp Idle OS User OS User Domain OS Xen Intel 1602 19.8% 35.7% 0.8% 39.7% 1.0% 3.0% 7,438 7,853 Xen RiceNIC 1674 13.7% 41.5% 0.5% 39.5% 1.0% 3.8% 8,839 5,661 CDNA RiceNIC 1867 10.2% 0.3% 0.2% 37.8% 0.7% 50.8% 0 13,659 Table2.Transmitperformanceforasingleguestwith2NICsusingXenandCDNA. DomainExecutionProfile Interrupts/s System NIC Mb/s DriverDomain GuestOS Driver Guest Hyp Idle OS User OS User Domain OS Xen Intel 1112 25.7% 36.8% 0.5% 31.0% 1.0% 5.0% 11,138 5,193 Xen RiceNIC 1075 30.6% 39.4% 0.6% 28.8% 0.6% 0% 10,946 5,163 CDNA RiceNIC 1874 9.9% 0.3% 0.2% 48.0% 0.7% 40.9% 0 7,402 Table3.Receiveperformanceforasingleguestwith2NICsusingXenandCDNA. 5.2 SingleGuestPerformance Table 2 shows that using all of the available processing resources,Xen’ssoftwarevirtualizationisnotabletotrans- mitatlinerateover twonetwork interfaceswitheitherthe Tables 2 and 3 show the transmit and receive perfor- Intel hardware or the RiceNIC hardware. However, only mance of a single guest operating system over two physi- 41%oftheprocessorisusedbytheguestoperatingsystem. calnetworkinterfacesusingXenandCDNA.Thefirsttwo TheremainingresourcesareconsumedbyXenoverheads— rows of each table show the performance of the Xen I/O usingtheIntelhardware, approximately20%inthehyper- virtualizationarchitectureusingboththeIntelandRiceNIC visor and 37% in the driver domain performing software network interfaces. The third row of each table shows the multiplexingandothertasks. performanceoftheCDNAI/Ovirtualizationarchitecture. As the table shows, CDNA is able to saturate two net- The Intel network interface can only be used with Xen work interfaces, whereas traditional Xen networking can- through the use of software virtualization. However, the not. Additionally, CDNA performs far more efficiently, RiceNICcanbeusedwithbothCDNAandsoftwarevirtu- with 51% processor idle time. The increase in idle time alization. TousetheRiceNICinterfacewithsoftwarevirtu- is primarily the result of two factors. First, nearly all of alization, a context was assigned to the driver domain and the time spent in the driver domain is eliminated. The re- no contexts were assigned to the guest operating system. mainingtimespentinthedriverdomainisunrelatedtonet- Therefore,allnetworktrafficfromtheguestoperatingsys- working tasks. Second, the time spent in the hypervisor is temisroutedviathedriverdomainasitnormallywouldbe, decreased. WithXen,thehypervisorspendsthebulkofits throughtheuseofsoftwarevirtualization. Withinthedriver time managing the interactions between the front-end and domain, all of the mechanisms within the CDNA NIC are back-end virtual network interface drivers. CDNA elimi- used identically to the way they would be used by a guest nates these communication overheads with the driver do- operatingsystemwhenconfiguredtouseconcurrentdirect main, sothehypervisorinsteadspendsthebulkofitstime networkaccess. Asthetablesshow,theIntelnetworkinter- managingDMAmemoryprotection. face performs similarly to the RiceNIC network interface. Therefore, the benefits achieved with CDNA are the result Table3showsthereceiveperformanceofthesamecon- of the CDNA I/O virtualization architecture, not the result figurations. Receiving network traffic requires more pro- ofdifferencesinnetworkinterfaceperformance. cessorresources,soXenonlyachieves1112Mb/swiththe Intelnetworkinterface,andslightlylowerwiththeRiceNIC NotethatinXentheinterruptratefortheguestisnotnec- interface. Again, Xen overheads consume the bulk of the essarilythesameasitisforthedriver. Thisisbecausethe time, as the guest operating system only consumes about back-enddriverwithinthedriverdomainattemptstointer- 32% of the processor resources when using the Intel hard- rupttheguestoperatingsystemwhenever itgenerates new ware. work for the front-end driver. This can happen at a higher orlowerratethantheactualinterruptrategeneratedbythe As the table shows, not only is CDNA able to saturate networkinterfacedependingonavarietyoffactors,includ- the two network interfaces, it does so with 41% idle time. ingthenumberofpacketsthattraversetheEthernetbridge Again, nearly all of the time spent in the driver domain is eachtimethedriverdomainisscheduledbythehypervisor. eliminated. As with the transmit case, the CDNA archi- 8 ThispaperoriginallyappearedatHPCA2007 DomainExecutionProfile Interrupts/s System DMAProtection Mb/s DriverDomain GuestOS Driver Guest Hyp Idle OS User OS User Domain OS CDNA(Transmit) Enabled 1867 10.2% 0.3% 0.2% 37.8% 0.7% 50.8% 0 13,659 CDNA(Transmit) Disabled 1867 1.9% 0.2% 0.2% 37.0% 0.3% 60.4% 0 13,680 CDNA(Receive) Enabled 1874 9.9% 0.3% 0.2% 48.0% 0.7% 40.9% 0 7,402 CDNA(Receive) Disabled 1874 1.9% 0.2% 0.2% 47.2% 0.3% 50.2% 0 7,243 Table4.CDNA2-NICtransmitandreceiveperformancewithandwithoutDMAmemoryprotection. tecturepermitsthehypervisortospenditstimeperforming spentinthehypervisorperformingprotectionoperations. DMAmemoryprotectionratherthanmanaginghigher-cost Even as systems begin to provide IOMMU support for interdomain communications as is required using software techniques such as CDNA, older systems will continue virtualization. to lack such features. In order to generalize the design In summary, the CDNA I/O virtualization architecture of CDNA for systems with and without an appropriate provides significant performance improvements over Xen IOMMU, wrapper functions could be used around the hy- for both transmit and receive. On the transmit side, percalls within the guest device drivers. The hypervisor CDNArequireshalftheprocessorresourcestodeliverabout must notify the guest whether or not there is an IOMMU. 200 Mb/s higher throughput. On the receive side, CDNA When no IOMMU is present, the wrappers would simply requires 60% of the processor resources to deliver about callthehypervisor,asdescribedhere. WhenanIOMMUis 750Mb/shigherthroughput. present,thewrapperwouldinsteadcreateDMAdescriptors withouthypervisorinterventionandonlyinvokethehyper- 5.3 MemoryProtection visor to set up the IOMMU. Such wrappers already exist inmodernoperatingsystemstodealwithsuchIOMMUis- sues. The software-based protection mechanisms in CDNA can potentially be replaced by a hardware IOMMU. For example, AMD has proposed an IOMMU architecture for 5.4 Scalability virtualizationthatrestrictsthephysicalmemorythatcanbe accessedbyeachdevice[2]. AMD’sproposedarchitecture Figures3and4showtheaggregatetransmitandreceive providesmemoryprotectionaslongaseachdeviceisonly throughput, respectively, of Xen and CDNA with two net- accessedbyasingledomain. ForCDNA,suchanIOMMU work interfaces as the number of guest operating systems would have to be extended to work on a per-context basis, varies. The percentage of CPU idle time is also plotted rather than a per-device basis. This would also require a above each data point. CDNA outperforms Xen for both mechanism to indicate a context for each DMA transfer. transmit and receive both for a single guest, as previously Since CDNA only distinguishes between guest operating showninTables2and3,andasthenumberofguestoper- systemsandnottrafficflows,therearealimitednumberof atingsystemsisincreased. contexts, which may make a generic system-level context- Asthefiguresshow,theperformanceofbothCDNAand awareIOMMUpractical. softwarevirtualizationdegradesasthenumberofguestsin- Table4showstheperformanceoftheCDNAI/Ovirtu- creases. For Xen, this results in declining bandwidth, but alizationarchitecturebothwithandwithoutDMAmemory themarginalreductioninbandwidthdecreaseswitheachin- protection. (The performance of CDNA with DMA mem- creaseinthenumberofguests. ForCDNA,whiletheband- ory protection enabled was replicated from Tables 2 and 3 widthremainsconstant,theidletimedecreasestozero. De- forcomparisonpurposes.)BydisablingDMAmemorypro- spitethefactthatthereisnoidletimefor8ormoreguests, tection,theperformanceofthemodifiedCDNAsystemes- CDNA is still able to maintain constant bandwidth. This tablishes an upper bound on achievable performance in a is consistent with the leveling of the bandwidth achieved systemwithanappropriateIOMMU.However,therewould by software virtualization. Therefore, it is likely that with be additional hypervisor overhead to manage the IOMMU moreCDNANICs,thethroughputcurvewouldhaveasim- thatisnotaccounted forbythisexperiment. SinceCDNA ilarshapetothatofsoftwarevirtualization,butwithamuch can already saturate two network interfaces for both trans- higherpeakthroughputwhenusing1–4guests. mitandreceivetraffic,theeffectofremovingDMAprotec- TheseresultsclearlyshowthatnotonlydoesCDNAde- tion is to increase the idle time by about 9%. As the table liverbetternetworkperformanceforasingleguestoperat- shows,thisincreaseinidletimeisthedirectresultofreduc- ing system within Xen, but it also maintains significantly ing the number of hypercalls from the guests and the time higherbandwidthasthenumberofguestoperatingsystems 9 ThispaperoriginallyappearedatHPCA2007 2000 50.8%25.4% 5.9% 0% 0% 0% 0% 0% 2000 40.9%29.1% 12.6% 0% 0% 0% 0% 0% 1800 1800 Mbps)1600 3.0% 0% Mbps)1600 Xen Transmit Throughput (111802400000000 0% 0% 0% 0% CXeDnN /A0 I% n/ tReliceNIC0% Xen Receive Throughput (111802400000000 5.0% 0% 0% 0% CXeDnN /A I n/ tReliceNIC 600 600 0% 0% 0% 0% 400 400 1 2 4 8 12 16 20 24 1 2 4 8 12 16 20 24 Xen Guests Xen Guests Figure3.TransmitthroughputforXenand Figure 4. Receive throughput for Xen and CDNA(withCDNAidletime). CDNA(withCDNAidletime). is increased. With 24 guest operating systems, CDNA’s amongguestoperatingsystemsandtoenablethehypervisor transmitbandwidthisafactorof2.1higherthanXen’sand tooccupyanewprivilegeleveldistinctfromthosenormally CDNA’s receive bandwidth is a factor of 3.3 higher than used by the operating system. These improvements will Xen’s. reduce the duration and frequency of calls into the hyper- visor, which should decrease the performance overhead of 6 RelatedWork virtualization. However, none of the proposed innovations directly address the network performance issues discussed inthispaper,suchastheinherentoverheadinmultiplexing Previousstudieshavealsofoundthatnetworkvirtualiza- and copying/remapping data between the guest and driver tion implemented entirely in software has high overhead. domains. While the context switches between the two do- In 2001, Sugerman, et al. showed that in VMware, it mains may be reduced in number or accelerated, the over- could take up to six times the processor resources to satu- head ofcommunication and multiplexing within thedriver rate a 100 Mb/s network than in native Linux [19]. Sim- domain will remain. Therefore, concurrent direct network ilarly, in 2005, Menon, et al. showed that in Xen, net- access will continue to be an important element of VMMs workthroughputdegradesbyuptoafactorof5overnative fornetworkingworkloads. Linuxforprocessor-boundnetworkingworkloadsusingGi- gabit Ethernet links [13]. Section 2.3 shows that the I/O VMMs that utilize full virtualization, such as VMware performance of Xen has improved, but there is still signif- ESX Server [7], support fullbinary compatibility withun- icant network virtualization overhead. Menon, et al. have modifiedguestoperatingsystems. ThisimpactstheI/Ovir- also shown that it is possible to improve transmit perfor- tualization architecture of such systems, as the guest op- mance with software-only mechanisms (mainly by lever- erating system must be able to use its unmodified native aging TSO) [12]. However, there are no known software devicedrivertoaccessthevirtualnetworkinterface. How- mechanismstosubstantivelyimprovereceiveperformance. ever, VMware also allows the use of paravirtualized net- Motivatedbytheseperformanceissues,RajandSchwan workdrivers(i.e., vmxnet), whichenables theuseoftech- presentedanEthernetnetworkinterfacetargetedatVMMs niquessuchasCDNA. that performs traffic multiplexing and interrupt deliv- The CDNA architecture is similar to that of user-level ery[16].Whiletheirproposedarchitecturebearssomesim- networkingarchitecturesthatallowprocessestobypassthe ilarity to CDNA, they did not present any mechanism for operating system and access the NIC directly [5, 6, 8, 14, DMAmemoryprotection. 15, 18, 20, 21]. Like CDNA, these architectures require As a result of the growing popularity of VMMs for DMAmemoryprotection,aninterruptdeliverymechanism, commodity hardware, both AMD and Intel are introduc- andnetworktrafficmultiplexing. Bothuser-levelnetwork- ing virtualization support to their microprocessors [2, 10]. ingarchitecturesandCDNAhandletrafficmultiplexingon Thisvirtualizationsupportshouldimprovetheperformance the network interface. The only difference is that user- of VMMs by providing mechanisms to simplify isolation levelNICshandleflowsonaper-applicationbasis,whereas 10

Description:
Aravind Menon‡. Scott Rixner†. Alan L. Cox†. Willy Zwaenepoel‡. †Rice University. Houston, TX. {willmann,shafer,dcarr,rixner,alc}@rice.edu. ‡EPFL. Lausanne
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.