ebook img

Efficient Deterministic Replay through Dynamic Binary Translation PDF

88 Pages·2015·0.64 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Efficient Deterministic Replay through Dynamic Binary Translation

Efficient Deterministic Replay through Dynamic Binary Translation Piyus Kedia Amarnath and Shashi Khosla School of IT This dissertation is submitted for the degree of Doctor of Philosophy Indian Institute of Technology Delhi January 2015 To Nature Cerificate I am satisfied that the thesis presented by is worthy of consideration for the Piyus Kedia award of the degree of Doctor of Philosophy and is a record of original bonafide research work carried out by him under my guidance and supervision and that the results contained in it have not been submitted in part or full to any other university or institute for award of any degreeordiploma. Dr. Sorav Bansal ComputerSc. &Engg. Acknowledgements I would like to thank my loving parents for their love and affection and for the flexibility to dowhateverIlike. IwouldliketothankmyGrandfatherandGrandmotherforteachingmeto upholdthe values theypassed on to meat every stageof my life. I want to thank my teachers inschool whointroducedmetomathematicsandscienceat avery early stageinmylife. I want to thank my supervisorDr. Sorav Bansal withoutwhom this thesis would not have been possible. If not for him I would perhaps havenot done a PhD at all. Through him I was able to discover my true area of interest. He helped me immensely in writing my papers and running through my practice talks. His spending countless hours discussing my problems at therighttimeswasabigdrivingforcebehindthisthesis. Itisabigprivilegetobehisstudent. I would like to thank my friends Dipanjan and Kinshuk for making my days memorable during my stay at IIT Delhi. Without them it would not have been so easy to complete this PhD. I want to thank all my lab mates for wierd discussions we so often had which made the labsomuchmorelivelyandfuntoworkin. Iwouldalsoliketothankmycommitteemembers forattendingmyPhD progress talks. Iwant tothank MHRD andIBM forsupportingmefinancially duringmyPhD. Abstract We present an efficient software implementation to deterministically record and replay a full multiprocessor virtual machine (VM), including its guest OS kernel and applications. De- terministically replaying a shared-memory monolithic OS kernel (like Linux) presents a sig- nificant performance challenge, and we demonstrate the use of dynamic binary translation to achievethisobjective. Dynamic binary translation (DBT) is a powerful technique with several important appli- cations. System-level binary translators have been used for implementing a Virtual Machine Monitor [2] and for instrumentation in the OS kernel [29]. In current designs, the perfor- mance overhead of binary translation on kernel-intensive workloads is high. e.g., over 10x slowdowns were reported on the syscall nanobenchmark in [2], 2-5x slowdowns were re- ported on microbenchmarks in [29]. These overheads are primarily due to the extra lmben h work required to correctly handle kernel mechanisms like interrupts, exceptions, and physi- cal CPU concurrency. Since the overhead of DBT is itself very high hence we can not use it for improving determinstic replay performance. We present a kernel-level binary translation mechanismwhichexhibitsnear-nativeperformanceevenonapplicationswithlargekernelac- tivity. Our translator relaxes transparency requirements and aggressively takes advantage of kernel invariants to eliminatesources of slowdown. We have implementedour translator as a loadable module in unmodified Linux, and present performance and scalability experiments onmultiprocessorhardware. AlthoughourimplementationisLinuxspecific,ourmechanisms arequitegeneral;weonlytakeadvantageoftypicalkerneldesignpatterns,notLinux-specific features. The biggest challenge in deterministicallyreplaying a multiprocessorsystem is recording the order of shared memory read and writes. The previous comparable approach [27] uses CREW (Concurrent Read Exclusive Write) protocol at the page granularity. Page grained CREW protocol uses hardware page protection technique(EPT/Shadow)to restrict the access privilegeof the CPUs such that, multipleCPUs can read from a page by acquiring shared ac- cess privilegeof the page but for writing to a page they needs to acquire the exclusiveaccess privilegeofthatpage. Thisschemesuffersfromfalsesharingandhugeshuttlingbetweenpro- cessors for benchmarks having large amount of sharing such as Linux kernel. Every transfer x ofprivilegeisrecordedinordertoreproducethesametransitionduringthereplay. Weimple- ment CREW at the byte granularity using DBT to eliminatefalse sharing. To achievethis we insertreader/writerlockbeforeeveryshared memoryaccess and thenreducethecostofthese locks by choosing the granularity of locks such that number of lock acquisition is minimized at thecostoflosingconcurrency. There are two broad approaches to doing this, which we call data-level mutual-exclusion and code-level mutual-exclusion. Data-level mutual-exclusionmodels all code and data as belonging to one shared address space,andsynchronizeseachmemoryaccessbyeachCPU.Inotherwords,aCPUismodeled as a thread executing in a shared-memory cache-coherent address space. This is identical to the underlying hardware model on shared-memory machines. Synchronization involves CREW-like ownership tracking of memory locations, which involves associating metadata with each memory location to store its ownership status. Code-level mutual-exclusion, on the other hand, divides code regions into disjoint sets called monitors, with the property that instructionsfrom twodifferent monitorscan neveraccess amemorylocationconcurrently. In thismodel, ensuringthat at mostoneCPU is activeinsideamonitorat all times,suffices. We propose a hybrid approach which uses data-level mutual-exclusion for some part of the code and code-levelmutual-exclusionforotherpartofthecodeforbetterspeedup. Our implementation exhibits 15-273% recording overhead for several important kernel- intensive benchmarks on a four-processor machine, which is an 11x average improvement overthebestexistingcomparableapproach.

Description:
[52] Marek Olszewski, Jason Ansel, and Saman Amarasinghe. Kendo: Efficient deterministic multithreading in software. In ASPLOS '09. pages 3, 8, 63.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.